--- license: cc-by-4.0 --- # smugri3_14 The TartuNLP Multilingual Neural Machine Translation model for low-resource Finno-Ugric languages. The model can translate in 702 directions, between 27 languages. ### Languages Supported - **High and Mid-Resource Languages:** Estonian, English, Finnish, Hungarian, Latvian, Norwegian, Russian - **Low-Resource Finno-Ugric Languages:** Komi, Komi Permyak, Udmurt, Hill Mari, Meadow Mari, Erzya, Moksha, Proper Karelian, Livvi Karelian, Ludian, Võro, Veps, Livonian, Northern Sami, Southern Sami, Inari Sami, Lule Sami, Skolt Sami, Mansi, Khanty ### Usage To use this model for translation tasks, you will need to utilize the [**Fairseq v0.12.2**](https://pypi.org/project/fairseq/0.12.2/). Bash script example: ``` # Define target and source languages src_lang="eng_Latn" tgt_lang="kpv_Cyrl" # Directories and paths model_path=./smugri3_14-finno-ugric-nmt checkpoint_path=${model_path}/smugri3_14.pt sp_path=${model_path}/flores200_sacrebleu_tokenizer_spm.ext.model dictionary_path=${model_path}/nllb_model_dict.ext.txt # Language settings for fairseq nllb_langs="eng_Latn,est_Latn,fin_Latn,hun_Latn,lvs_Latn,nob_Latn,rus_Cyrl" new_langs="kca_Cyrl,koi_Cyrl,kpv_Cyrl,krl_Latn,liv_Latn,lud_Latn,mdf_Cyrl,mhr_Cyrl,mns_Cyrl,mrj_Cyrl,myv_Cyrl,olo_Latn,sma_Latn,sme_Latn,smj_Latn,smn_Latn,sms_Latn,udm_Cyrl,vep_Latn,vro_Latn" # Start fairseq-interactive in interactive mode fairseq-interactive ${model_path} \ -s ${src_lang} -t ${tgt_lang} \ --path ${checkpoint_path} --max-tokens 20000 --buffer-size 1 \ --beam 4 --lenpen 1.0 \ --bpe sentencepiece \ --remove-bpe \ --lang-tok-style multilingual \ --sentencepiece-model ${sp_path} \ --fixed-dictionary ${dictionary_path} \ --task translation_multi_simple_epoch \ --decoder-langtok --encoder-langtok src \ --lang-pairs ${src_lang}-${tgt_lang} \ --langs "${nllb_langs},${new_langs}" \ --cpu ```