Translation
Fairseq
AlexTheSun commited on
Commit
bed2078
·
1 Parent(s): a237630

Update README.md with usage example

Browse files
Files changed (1) hide show
  1. README.md +37 -4
README.md CHANGED
@@ -1,12 +1,45 @@
1
  ---
2
  license: cc-by-4.0
3
  ---
4
- #### smugri3_14
5
  The TartuNLP Multilingual Neural Machine Translation model for low-resource Finno-Ugric languages. The model can translate in 702 directions, between 27 languages.
6
 
7
- #### Languages Supported
8
  - **High and Mid-Resource Languages:** Estonian, English, Finnish, Hungarian, Latvian, Norwegian, Russian
9
  - **Low-Resource Finno-Ugric Languages:** Komi, Komi Permyak, Udmurt, Hill Mari, Meadow Mari, Erzya, Moksha, Proper Karelian, Livvi Karelian, Ludian, Võro, Veps, Livonian, Northern Sami, Southern Sami, Inari Sami, Lule Sami, Skolt Sami, Mansi, Khanty
10
 
11
- #### Usage
12
- To use this model for translation tasks, you will need to utilize the Fairseq v0.12.2.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
  ---
4
+ # smugri3_14
5
  The TartuNLP Multilingual Neural Machine Translation model for low-resource Finno-Ugric languages. The model can translate in 702 directions, between 27 languages.
6
 
7
+ ### Languages Supported
8
  - **High and Mid-Resource Languages:** Estonian, English, Finnish, Hungarian, Latvian, Norwegian, Russian
9
  - **Low-Resource Finno-Ugric Languages:** Komi, Komi Permyak, Udmurt, Hill Mari, Meadow Mari, Erzya, Moksha, Proper Karelian, Livvi Karelian, Ludian, Võro, Veps, Livonian, Northern Sami, Southern Sami, Inari Sami, Lule Sami, Skolt Sami, Mansi, Khanty
10
 
11
+ ### Usage
12
+ To use this model for translation tasks, you will need to utilize the [**Fairseq v0.12.2**](https://pypi.org/project/fairseq/0.12.2/).
13
+
14
+ Bash script example:
15
+ ```
16
+ # Define target and source languages
17
+ src_lang="eng_Latn"
18
+ tgt_lang="kpv_Cyrl"
19
+
20
+ # Directories and paths
21
+ model_path=./smugri3_14-finno-ugric-nmt
22
+ checkpoint_path=${model_path}/smugri3_14.pt
23
+ sp_path=${model_path}/flores200_sacrebleu_tokenizer_spm.ext.model
24
+ dictionary_path=${model_path}/nllb_model_dict.ext.txt
25
+
26
+ # Language settings for fairseq
27
+ nllb_langs="eng_Latn,est_Latn,fin_Latn,hun_Latn,lvs_Latn,nob_Latn,rus_Cyrl"
28
+ new_langs="kca_Cyrl,koi_Cyrl,kpv_Cyrl,krl_Latn,liv_Latn,lud_Latn,mdf_Cyrl,mhr_Cyrl,mns_Cyrl,mrj_Cyrl,myv_Cyrl,olo_Latn,sma_Latn,sme_Latn,smj_Latn,smn_Latn,sms_Latn,udm_Cyrl,vep_Latn,vro_Latn"
29
+
30
+ # Start fairseq-interactive in interactive mode
31
+ fairseq-interactive ${model_path} \
32
+ -s ${src_lang} -t ${tgt_lang} \
33
+ --path ${checkpoint_path} --max-tokens 20000 --buffer-size 1 \
34
+ --beam 4 --lenpen 1.0 \
35
+ --bpe sentencepiece \
36
+ --remove-bpe \
37
+ --lang-tok-style multilingual \
38
+ --sentencepiece-model ${sp_path} \
39
+ --fixed-dictionary ${dictionary_path} \
40
+ --task translation_multi_simple_epoch \
41
+ --decoder-langtok --encoder-langtok src \
42
+ --lang-pairs ${src_lang}-${tgt_lang} \
43
+ --langs "${nllb_langs},${new_langs}" \
44
+ --cpu
45
+ ```