Translation
Fairseq
AlexTheSun's picture
Update README.md with web demo link
029b141
|
raw
history blame
2.84 kB
---
license: cc-by-4.0
---
# smugri3_14
The TartuNLP Multilingual Neural Machine Translation model for low-resource Finno-Ugric languages. The model can translate in 702 directions, between 27 languages.
### Languages Supported
- **High and Mid-Resource Languages:** Estonian, English, Finnish, Hungarian, Latvian, Norwegian, Russian
- **Low-Resource Finno-Ugric Languages:** Komi, Komi Permyak, Udmurt, Hill Mari, Meadow Mari, Erzya, Moksha, Proper Karelian, Livvi Karelian, Ludian, Võro, Veps, Livonian, Northern Sami, Southern Sami, Inari Sami, Lule Sami, Skolt Sami, Mansi, Khanty
### Usage
The model can be tested in our [web demo](https://translate.ut.ee/).
To use this model for translation tasks, you will need to utilize the [**Fairseq v0.12.2**](https://pypi.org/project/fairseq/0.12.2/).
Bash script example:
```
# Define target and source languages
src_lang="eng_Latn"
tgt_lang="kpv_Cyrl"
# Directories and paths
model_path=./smugri3_14-finno-ugric-nmt
checkpoint_path=${model_path}/smugri3_14.pt
sp_path=${model_path}/flores200_sacrebleu_tokenizer_spm.ext.model
dictionary_path=${model_path}/nllb_model_dict.ext.txt
# Language settings for fairseq
nllb_langs="eng_Latn,est_Latn,fin_Latn,hun_Latn,lvs_Latn,nob_Latn,rus_Cyrl"
new_langs="kca_Cyrl,koi_Cyrl,kpv_Cyrl,krl_Latn,liv_Latn,lud_Latn,mdf_Cyrl,mhr_Cyrl,mns_Cyrl,mrj_Cyrl,myv_Cyrl,olo_Latn,sma_Latn,sme_Latn,smj_Latn,smn_Latn,sms_Latn,udm_Cyrl,vep_Latn,vro_Latn"
# Start fairseq-interactive in interactive mode
fairseq-interactive ${model_path} \
-s ${src_lang} -t ${tgt_lang} \
--path ${checkpoint_path} --max-tokens 20000 --buffer-size 1 \
--beam 4 --lenpen 1.0 \
--bpe sentencepiece \
--remove-bpe \
--lang-tok-style multilingual \
--sentencepiece-model ${sp_path} \
--fixed-dictionary ${dictionary_path} \
--task translation_multi_simple_epoch \
--decoder-langtok --encoder-langtok src \
--lang-pairs ${src_lang}-${tgt_lang} \
--langs "${nllb_langs},${new_langs}" \
--cpu
```
### Scores
Average:
| to-lang | bleu | chrf | chrf++ |
| ------- | ----- | ---- | ------ |
| ru | 24.82 | 51.81 | 49.08 |
| en | 28.24 | 55.91 | 53.73 |
| et | 18.66 | 51.72 | 47.69 |
| fi | 15.45 | 50.04 | 45.38 |
| hun | 16.73 | 47.38 | 44.19 |
| lv | 18.15 | 49.04 | 45.54 |
| nob | 14.43 | 45.64 | 42.29 |
| kpv | 10.73 | 42.34 | 38.50 |
| liv | 5.16 | 29.95 | 27.28 |
| mdf | 5.27 | 37.66 | 32.99 |
| mhr | 8.51 | 43.42 | 38.76 |
| mns | 2.45 | 27.75 | 24.03 |
| mrj | 7.30 | 40.81 | 36.40 |
| myv | 4.72 | 38.74 | 33.80 |
| olo | 4.63 | 34.43 | 30.00 |
| udm | 7.50 | 40.07 | 35.72 |
| krl | 9.39 | 42.74 | 38.24 |
| vro | 8.64 | 39.89 | 35.97 |
| vep | 6.73 | 38.15 | 33.91 |
| lud | 3.11 | 31.50 | 27.30 |