Added references for the ParlaSpeech-HR dataset
#1
by
nljubesi
- opened
README.md
CHANGED
@@ -97,7 +97,7 @@ Full config can be found inside the `.nemo` files.
|
|
97 |
|
98 |
### Datasets
|
99 |
|
100 |
-
All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
|
101 |
|
102 |
## Performance
|
103 |
|
@@ -130,4 +130,8 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
|
130 |
|
131 |
- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
132 |
|
133 |
-
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
|
|
|
|
|
|
|
|
|
97 |
|
98 |
### Datasets
|
99 |
|
100 |
+
All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
|
101 |
|
102 |
## Performance
|
103 |
|
|
|
130 |
|
131 |
- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
132 |
|
133 |
+
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
134 |
+
|
135 |
+
- [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
|
136 |
+
|
137 |
+
- [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)
|