metadata
pipeline_tag: translation
license: mit
language:
- multilingual
- af
- am
- ar
- ast
- az
- ba
- be
- bg
- bn
- br
- bs
- ca
- ceb
- cs
- cy
- da
- de
- el
- en
- es
- et
- fa
- ff
- fi
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- he
- hi
- hr
- ht
- hu
- hy
- id
- ig
- ilo
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- lb
- lg
- ln
- lo
- lt
- lv
- mg
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- 'no'
- ns
- oc
- or
- pa
- pl
- ps
- pt
- ro
- ru
- sd
- si
- sk
- sl
- so
- sq
- sr
- ss
- su
- sv
- sw
- ta
- th
- tl
- tn
- tr
- uk
- ur
- uz
- vi
- wo
- xh
- yi
- yo
- zh
- zu
flores101_mm100_175M
https://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html
flores101_mm100_175M
is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. It was first released in this repository.
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
chinese_text = "生活就像一盒巧克力。"
model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
# translate Hindi to French
tokenizer.src_lang = "hi"
encoded_hi = tokenizer(hi_text, return_tensors="pt")
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr"))
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
# => "La vie est comme une boîte de chocolat."
# translate Chinese to English
tokenizer.src_lang = "zh"
encoded_zh = tokenizer(chinese_text, return_tensors="pt")
generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
# => "Life is like a box of chocolate."
Languages covered
Language | lang code |
---|---|
Akrikaans | af |
Amharic | am |
Arabic | ar |
Assamese | as |
Asturian | ast |
Aymara | ay |
Azerbaijani | az |
Bashkir | ba |
Belarusian | be |
Bulgarian | bg |
Bengali | bn |
Breton | br |
Bosnian | bs |
Catalan | ca |
Cebuano | ceb |
Chokwe | cjk |
Czech | cs |
Welsh | cy |
Danish | da |
German | de |
Dyula | dyu |
Greek | el |
English | en |
Spanish | es |
Estonian | et |
Persian | fa |
Fulah | ff |
Finnish | fi |
French | fr |
Western Frisian | fy |
Irish | ga |
Scottish Gaelic | gd |
Galician | gl |
Gujarati | gu |
Hausa | ha |
Hebrew | he |
Hindi | hi |
Croatian | hr |
Haitian Creole | ht |
Hungarian | hu |
Armenian | hy |
Indonesian | id |
Igbo | ig |
Iloko | ilo |
Icelandic | is |
Italian | it |
Japanese | ja |
Javanese | jv |
Georgian | ka |
Kachin | kac |
Kamba | kam |
Kabuverdianu | kea |
Kongo | kg |
Kazakh | kk |
Central Khmer | km |
Kimbundu | kmb |
Northern Kurdish | kmr |
Kannada | kn |
Korean | ko |
Kurdish | ku |
Kyrgyz | ky |
Luxembourgish | lb |
Ganda | lg |
Lingala | ln |
Lao | lo |
Lithuanian | lt |
Luo | luo |
Latvian | lv |
Malagasy | mg |
Maori | mi |
Macedonian | mk |
Malayalam | ml |
Mongolian | mn |
Marathi | mr |
Malay | ms |
Maltese | mt |
Burmese | my |
Nepali | ne |
Dutch | nl |
Norwegian | no |
Northern Sotho | ns |
Nyanja | ny |
Occitan | oc |
Oromo | om |
Oriya | or |
Punjabi | pa |
Polish | pl |
Pashto | ps |
Portuguese | pt |
Quechua | qu |
Romanian | ro |
Russian | ru |
Sindhi | sd |
Shan | shn |
Sinhala | si |
Slovak | sk |
Slovenian | sl |
Shona | sn |
Somali | so |
Albanian | sq |
Serbian | sr |
Swati | ss |
Sundanese | su |
Swedish | sv |
Swahili | sw |
Tamil | ta |
Telugu | te |
Tajik | tg |
Thai | th |
Tigrinya | ti |
Tagalog | tl |
Tswana | tn |
Turkish | tr |
Ukrainian | uk |
Umbundu | umb |
Urdu | ur |
Uzbek | uz |
Vietnamese | vi |
Wolof | wo |
Xhosa | xh |
Yiddish | yi |
Yoruba | yo |
Chinese | zh |
Zulu | zu |