TanelAlumae
commited on
Commit
•
1b1adee
1
Parent(s):
dfa22af
Update README.md
Browse files
README.md
CHANGED
@@ -195,7 +195,7 @@ Since the model is trained on VoxLingua107, it has many limitations and biases,
|
|
195 |
|
196 |
## Training data
|
197 |
|
198 |
-
The model is trained on [VoxLingua107](
|
199 |
|
200 |
VoxLingua107 is a speech dataset for training spoken language identification models.
|
201 |
The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.
|
|
|
195 |
|
196 |
## Training data
|
197 |
|
198 |
+
The model is trained on [VoxLingua107](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
|
199 |
|
200 |
VoxLingua107 is a speech dataset for training spoken language identification models.
|
201 |
The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.
|