mzboito commited on
Commit
9c1b2c8
1 Parent(s): c3109e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -125,6 +125,37 @@ language:
125
 
126
  ## mHuBERT-147 models
127
 
 
128
 
 
 
 
 
129
 
130
- Languages present not indexed by Huggingface: Asturian (ast), Basaa (bas), Cebuano (ceb), Central Kurdish/Sorani (ckb), Hakha Chin (cnh), Hawaiian (haw), Upper Sorbian (hsb) Kabyle (kab), Moksha (mdf), Meadow Mari (mhr), Hill Mari (mrj), Erzya (myv), Taiwanese Hokkien (nan-tw), Sursilvan (rm-sursilv), Vallader (rm-vallader), Sakha (sah), Santali (sat), Scots (sco), Saraiki (skr), Tigre (tig), Tok Pisin (tpi), Akwapen Twi (tw-akuapem), Asante Twi (tw-asante), Votic (vot), Waray (war), Cantonese (yue),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
  ## mHuBERT-147 models
127
 
128
+ mHuBERT-147 are multilingual general-purpose HuBERT models trained on 90K hours of open-license data in 147 languages.
129
 
130
+ This repository contains:
131
+ * Fairseq checkpoint (original);
132
+ * HuggingFace checkpoint;
133
+ * Faiss index for continuous pre-training (OPQ16_64,IVF1000_HNSW32,PQ16x4fsr).
134
 
135
+
136
+ # Citing
137
+
138
+ ```
139
+ [PAPER GOES HERE]
140
+ '''
141
+
142
+ # Other information
143
+
144
+ **Languages present not indexed by Huggingface:** Asturian (ast), Basaa (bas), Cebuano (ceb), Central Kurdish/Sorani (ckb), Hakha Chin (cnh), Hawaiian (haw), Upper Sorbian (hsb) Kabyle (kab), Moksha (mdf), Meadow Mari (mhr), Hill Mari (mrj), Erzya (myv), Taiwanese Hokkien (nan-tw), Sursilvan (rm-sursilv), Vallader (rm-vallader), Sakha (sah), Santali (sat), Scots (sco), Saraiki (skr), Tigre (tig), Tok Pisin (tpi), Akwapen Twi (tw-akuapem), Asante Twi (tw-asante), Votic (vot), Waray (war), Cantonese (yue).
145
+
146
+ **Datasets:**
147
+ * Aishell
148
+ * BibleTTS
149
+ * ClovaCall
150
+ * CommonVoice v11
151
+ * Google TTS data
152
+ * IISc-MILE
153
+ * JVS
154
+ * Kokoro
155
+ * Kosp2e
156
+ * Media Speech
157
+ * Multilingual LibriSpeech
158
+ * Samrómur
159
+ * THCHS-30 and THUYG-20
160
+ * VoxLingua107
161
+ * VoxPopuli