AI & ML interests

None defined yet.

Welcome to ConFit on Huggingface Hub

About Us

ConFit is a pioneering organisation dedicated to advancing the fields of speech and language processing, audio and sound processing, and natural language processing (NLP). Our team is committed to developing state-of-the-art technologies and tools that empower researchers and developers in the audio and language domains. We provide a rich collection of audio datasets specifically designed for various machine learning applications. These datasets are perfect for training models on tasks such as audio embedding, speech recognition, and more. Our datasets are compatible with popular frameworks and can be seamlessly integrated into your projects.

Datasets

Audio classification:

Dataset Split Method Classes Task # Clips Average Duration Sampling Rate
WMMS train/test 32 Multi-class 1697 10.42 16000
MSWC (English) train/validation/test 271 Multi-class 33726 0.99 16000
MSWC (Spanish) train/validation/test 146 Multi-class 11759 0.99 16000
MSWC (Indian) train/validation/test 14 Multi-class 739 0.99 16000
ESC50 5-fold 50 Multi-class 2000 5.00 44100
UrbanSound8K 10-fold 10 Multi-class 8732 3.60 8000
AudioSet (balanced) train/test 527 Multi-label 39437 9.89 32000
MagnaTagATune train/validation/test 50 Multi-label 21108 29.12 16000
Medley-solos-DB train/validation/test 8 Multi-class 21571 2.97 44100
Pianos train/validation/test 8 Multi-class 668 4.86 16000
FSD-Kaggle-2019 (curated) train/test 80 Multi-label 9451 8.93 44100
GTZAN train/validation/test 10 Multi-class 930 30.02 22050
Nsynth (instrument) train/validation/test 11 Multi-class 305979 4.00 16000
Nsynth (pitch) train/validation/test 112 Multi-class 305979 4.00 16000
CREMA-D train/validation/test 6 Multi-class 7442 2.54 16000
IEMOCAP 5-fold 4 Multi-class 5531 4.52 16000
EmoDB train/test 7 Multi-class 535 2.77 16000
EMOVO 6-fold 7 Multi-class 588 3.12 48000
IRMAS train/test 11 Multi-label 9579 7.16 44100
RAVDESS 5-fold 8 Multi-class 2880 3.70 48000
DCASE2018-Task3 train/test 2 Binary-class 35690 10.01 44100
TIMIT train/validation/test 630 Multi-class 6300 3.07 16000
LibriSpeech train/test 2484 Multi-class 21933 3.75 16000

Automated audio captioning:

Dataset Split Method # Clips Average Duration Sampling Rate
Music4All train 109269 29.99 48000
Clotho (v1.0) train/test 3938 22.43 44100
Clotho (v2.1) train/validation/test 8723 22.48 44100
AudioCaps train/validation/test 41113 8.38 48000
WavCaps (AudioSet-SL) train 85232 10.00 32000
WavCaps (SoundBible) train 1232 13.12 32000
WavCaps (BBC) train 31201 115.04 32000

Music, speech, and noise:

Dataset Split Method # Clips Average Duration Sampling Rate
MUSAN train 2016 195.16 16000
RIR-Noise train 61260 1.54 16000
ARCA23K train 17979 7.92 44100

Contact Us

If you have any questions or would like more information about our projects, please feel free to reach out to us.