File size: 3,812 Bytes
1cdf151
 
 
 
 
 
 
 
 
0623bf4
 
 
 
 
 
 
 
 
 
523d629
 
98a282f
8df1ce2
 
 
523d629
f048acf
98a282f
 
 
b547839
4db9c39
8df1ce2
 
 
 
 
 
523d629
8df1ce2
523d629
8df1ce2
523d629
344f733
8df1ce2
 
0623bf4
 
 
784ad1e
 
8df1ce2
 
4547d40
e0dcefb
ad86af3
36ec8a8
1714c84
0623bf4
 
 
8df1ce2
 
 
 
2994917
0623bf4
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: README
emoji: πŸ“ˆ
colorFrom: purple
colorTo: red
sdk: static
pinned: false
---

# Welcome to ConFit on Huggingface Hub

## About Us

ConFit is a pioneering organisation dedicated to advancing the fields of speech and language processing, audio and sound processing, and natural language processing (NLP). Our team is committed to developing state-of-the-art technologies and tools that empower researchers and developers in the audio and language domains. We provide a rich collection of audio datasets specifically designed for various machine learning applications. These datasets are perfect for training models on tasks such as audio embedding, speech recognition, and more. Our datasets are compatible with popular frameworks and can be seamlessly integrated into your projects.

## Datasets

Audio classification:

| Dataset | Split Method | Classes | Task | # Clips | Average Duration | Sampling Rate |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| WMMS | train/test | 31 | Multi-class | 1695 | 10.42 | 16000 |
| MSWC (English) | train/validation/test | 271 | Multi-class | 33726 | 0.99 | 16000 |
| MSWC (Spanish) | train/validation/test | 146 | Multi-class | 11759 | 0.99 | 16000 |
| MSWC (Indian) | train/validation/test | 14 | Multi-class | 739 | 0.99 | 16000 |
| ESC50 | 5-fold | 50 | Multi-class | 2000 | 5.00 | 44100 |
| UrbanSound8K | 10-fold | 10 | Multi-class | 8732 | 3.60 | 8000 |
| AudioSet (balanced, 20k) | train/test | 527 | Multi-label | 39,436 | 9.89 | 32000 |
| AudioSet (balanced, 500k) | train/test | 527 | Multi-label | 516,868 |  | 32000 |
| AudioSet (unbalanced, 2m) | train/test | 527 | Multi-label | 1,930,910 | 9.91 | 32000 |
| MagnaTagATune | train/validation/test | 50 | Multi-label | 21108 | 29.12 | 16000 |
| Medley-solos-DB | train/validation/test | 8 | Multi-class | 21571 | 2.97 | 44100 |
| Pianos | train/validation/test | 8 | Multi-class | 668 | 4.86 | 16000 |
| FSD-Kaggle-2019 (curated) | train/test | 80 | Multi-label | 9451 | 8.93 | 44100 |
| GTZAN | train/validation/test | 10 | Multi-class | 930 | 30.02 | 22050 |
| Nsynth (instrument) | train/validation/test | 11 | Multi-class | 305979 | 4.00 | 16000 |
| Nsynth (pitch) | train/validation/test | 112 | Multi-class | 305979 | 4.00 | 16000 |
| CREMA-D | train/validation/test | 6 | Multi-class | 7442 | 2.54 | 16000 |
| IEMOCAP | 5-fold | 4 | Multi-class | 5531 | 4.52 | 16000 |
| EmoDB | train/test | 7 | Multi-class | 535 | 2.77 | 16000 |
| EMOVO | 6-fold | 7 | Multi-class | 588 | 3.12 | 48000 |
| IRMAS | train/test | 11 | Multi-label | 9579 | 7.16 | 44100 |
| RAVDESS | 5-fold | 8 | Multi-class | 2880 | 3.70 | 48000 |
| DCASE2018-Task3 | train/test | 2 | Binary-class | 35690 | 10.01 | 44100 |
| TIMIT | train/validation/test | 630 | Multi-class | 6300 | 3.07 | 16000 |
| LibriSpeech | train/test | 2484 | Multi-class | 21933 | 3.75 | 16000 |

Automated audio captioning:

| Dataset | Split Method | # Clips | Average Duration | Sampling Rate |
| :---: | :---: | :---: | :---: | :---: |
| Music4All | train | 109269 | 29.99 | 48000 |
| Clotho (v1.0) | train/test | 3938 | 22.43 | 44100 |
| Clotho (v2.1) | train/validation/test | 8723 | 22.48 | 44100 |
| AudioCaps | train/validation/test | 41113 | 8.38 | 48000 |
| WavCaps (AudioSet-SL) | train | 85232 | 10.00 | 32000 |
| WavCaps (SoundBible) | train | 1232 | 13.12 | 32000 |
| WavCaps (BBC) | train | 31201 | 115.04 | 32000 |

Music, speech, and noise:

| Dataset | Split Method | # Clips | Average Duration | Sampling Rate |
| :---: | :---: | :---: | :---: | :---: |
| MUSAN | train | 2016 | 195.16 | 16000 |
| RIR-Noise | train | 61260 | 1.54 | 16000 |
| ARCA23K | train | 17979 | 7.92 | 44100 |

## Contact Us

If you have any questions or would like more information about our projects, please feel free to reach out to us.