Commit
•
0253049
1
Parent(s):
080588e
Update README.md (#11)
Browse files- Update README.md (1483d2d5c668d7921c757fc7cd014fa2cbe7dc5a)
Co-authored-by: Tanel Alumäe <[email protected]>
README.md
CHANGED
@@ -133,7 +133,7 @@ widget:
|
|
133 |
|
134 |
## Model description
|
135 |
|
136 |
-
This is a spoken language recognition model trained on the VoxLingua107 dataset using SpeechBrain.
|
137 |
The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses
|
138 |
more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
|
139 |
We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
|
@@ -259,7 +259,7 @@ The model has two uses:
|
|
259 |
- use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data
|
260 |
|
261 |
The model is trained on automatically collected YouTube data. For more
|
262 |
-
information about the dataset, see [here](
|
263 |
|
264 |
|
265 |
#### How to use
|
@@ -330,7 +330,7 @@ Since the model is trained on VoxLingua107, it has many limitations and biases,
|
|
330 |
|
331 |
## Training data
|
332 |
|
333 |
-
The model is trained on [VoxLingua107](
|
334 |
|
335 |
VoxLingua107 is a speech dataset for training spoken language identification models.
|
336 |
The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.
|
|
|
133 |
|
134 |
## Model description
|
135 |
|
136 |
+
This is a spoken language recognition model trained on the [VoxLingua107 dataset](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/) using SpeechBrain.
|
137 |
The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses
|
138 |
more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
|
139 |
We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
|
|
|
259 |
- use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data
|
260 |
|
261 |
The model is trained on automatically collected YouTube data. For more
|
262 |
+
information about the dataset, see [here](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
|
263 |
|
264 |
|
265 |
#### How to use
|
|
|
330 |
|
331 |
## Training data
|
332 |
|
333 |
+
The model is trained on [VoxLingua107](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
|
334 |
|
335 |
VoxLingua107 is a speech dataset for training spoken language identification models.
|
336 |
The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.
|