Update README.md
Browse files
README.md
CHANGED
@@ -196,13 +196,41 @@ language:
|
|
196 |
- zh
|
197 |
- zsm
|
198 |
- zu
|
199 |
-
|
200 |
-
|
201 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
202 |
tags:
|
203 |
- nllb
|
204 |
- translation
|
205 |
-
|
|
|
206 |
datasets:
|
207 |
- flores-200
|
208 |
metrics:
|
@@ -250,4 +278,73 @@ SentencePiece model is released along with NLLB-200.
|
|
250 |
• Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments.
|
251 |
|
252 |
## Carbon Footprint Details
|
253 |
-
• The carbon dioxide (CO2e) estimate is reported in Section 8.8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
- zh
|
197 |
- zsm
|
198 |
- zu
|
199 |
+
language_details: >-
|
200 |
+
ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab,
|
201 |
+
aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab,
|
202 |
+
asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl,
|
203 |
+
bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn,
|
204 |
+
bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn,
|
205 |
+
cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn,
|
206 |
+
dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn,
|
207 |
+
ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn,
|
208 |
+
fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr,
|
209 |
+
hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn,
|
210 |
+
hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn,
|
211 |
+
jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva,
|
212 |
+
kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr,
|
213 |
+
kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn,
|
214 |
+
lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn,
|
215 |
+
ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva,
|
216 |
+
mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn,
|
217 |
+
mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn,
|
218 |
+
nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn,
|
219 |
+
gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn,
|
220 |
+
prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn,
|
221 |
+
san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn,
|
222 |
+
smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn,
|
223 |
+
srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn,
|
224 |
+
tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi,
|
225 |
+
taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn,
|
226 |
+
tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab,
|
227 |
+
uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr,
|
228 |
+
yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn
|
229 |
tags:
|
230 |
- nllb
|
231 |
- translation
|
232 |
+
- nmt
|
233 |
+
license: cc-by-nc-4.0
|
234 |
datasets:
|
235 |
- flores-200
|
236 |
metrics:
|
|
|
278 |
• Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments.
|
279 |
|
280 |
## Carbon Footprint Details
|
281 |
+
• The carbon dioxide (CO2e) estimate is reported in Section 8.8.
|
282 |
+
|
283 |
+
## How to download this model using python
|
284 |
+
- Install Python https://www.python.org/downloads/
|
285 |
+
- `cmd`
|
286 |
+
- `python --version`
|
287 |
+
- `python -m pip install huggingface_hub`
|
288 |
+
- `python`
|
289 |
+
```
|
290 |
+
import huggingface_hub
|
291 |
+
huggingface_hub.download_snapshot('entai/nllb-300-3.3B-ctranslate2',local_dir='nllb-300-3.3B-ctranslate2')
|
292 |
+
```
|
293 |
+
|
294 |
+
## How to run this model
|
295 |
+
- https://opennmt.net/CTranslate2/guides/transformers.html#nllb
|
296 |
+
- `cmd`
|
297 |
+
- `python -m pip install ctranslate2 transformers`
|
298 |
+
- `python`
|
299 |
+
```
|
300 |
+
import ctranslate2
|
301 |
+
import transformers
|
302 |
+
|
303 |
+
src_lang = "eng_Latn"
|
304 |
+
tgt_lang = "fra_Latn"
|
305 |
+
|
306 |
+
translator = ctranslate2.Translator("nllb-200-3.3B-ctranslate2",device='cpu')
|
307 |
+
tokenizer = transformers.AutoTokenizer.from_pretrained("nllb-200-3.3B-ctranslate2", src_lang=src_lang, clean_up_tokenization_spaces=True)
|
308 |
+
|
309 |
+
source = tokenizer.convert_ids_to_tokens(tokenizer.encode("Hello world!"))
|
310 |
+
target_prefix = [tgt_lang]
|
311 |
+
results = translator.translate_batch([source], target_prefix=[target_prefix])
|
312 |
+
target = results[0].hypotheses[0][1:]
|
313 |
+
|
314 |
+
print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target)))
|
315 |
+
```
|
316 |
+
|
317 |
+
## Available languages
|
318 |
+
- https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200
|
319 |
+
|
320 |
+
```
|
321 |
+
ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab,
|
322 |
+
aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab,
|
323 |
+
asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl,
|
324 |
+
bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn,
|
325 |
+
bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn,
|
326 |
+
cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn,
|
327 |
+
dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn,
|
328 |
+
ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn,
|
329 |
+
fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr,
|
330 |
+
hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn,
|
331 |
+
hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn,
|
332 |
+
jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva,
|
333 |
+
kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr,
|
334 |
+
kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn,
|
335 |
+
lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn,
|
336 |
+
ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva,
|
337 |
+
mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn,
|
338 |
+
mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn,
|
339 |
+
nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn,
|
340 |
+
gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn,
|
341 |
+
prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn,
|
342 |
+
san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn,
|
343 |
+
smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn,
|
344 |
+
srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn,
|
345 |
+
tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi,
|
346 |
+
taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn,
|
347 |
+
tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab,
|
348 |
+
uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr,
|
349 |
+
yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn
|
350 |
+
```
|