Fairseq
Galician
Catalan
AudreyVM commited on
Commit
ab0c29d
1 Parent(s): b1cf35e

fixing typos

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -26,7 +26,7 @@ license: apache-2.0
26
 
27
  ## Model description
28
 
29
- This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Galician-Catalan datasets totalling 10.017.995 sentence pairs. 4.267.995 sentence pairs were parallel data collected from the web while the remaining 5.750.000 sentence pairs were parallel synthetic data created using the GL-EU translator of [Proxecto N贸s](https://huggingface.co/proxectonos/Nos_MT-OpenNMT-es-gl). The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
30
 
31
  ## Intended uses and limitations
32
 
@@ -123,7 +123,7 @@ Weights were saved every 1000 updates and reported results are the average of th
123
  We use the BLEU score for evaluation on test sets: [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200), [TaCon](https://elrc-share.eu/repository/browse/tacon-spanish-constitution-mt-test-set/84a96138b98611ec9c1a00155d02670628f3e6857b0f422abd82abc3795ec8c2/) and [NTREX](https://github.com/MicrosoftTranslator/NTREX)
124
  ### Evaluation results
125
  Below are the evaluation results on the machine translation from Galician to Catalan compared to [M2M100 1.2B](https://huggingface.co/facebook/m2m100_1.2B), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [ NLLB-200's distilled 1.3B variant](https://huggingface.co/facebook/nllb-200-distilled-1.3B):
126
- | Test set |M2M100 1.2B| NLLB 1.3B | NLLB 3.3 |mt-aina-eu-ca|
127
  |----------------------|-------|-----------|------------------|---------------|
128
  | Flores 200 devtest |32,6| 22,3 | **34,3** | 32,4 |
129
  | TaCON |56,5|32,2 | 54,1 | **58,2** |
 
26
 
27
  ## Model description
28
 
29
+ This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Galician-Catalan datasets totalling 10.017.995 sentence pairs. 4.267.995 sentence pairs were parallel data collected from the web while the remaining 5.750.000 sentence pairs were parallel synthetic data created using the GL-ES translator of [Proxecto N贸s](https://huggingface.co/proxectonos/Nos_MT-OpenNMT-es-gl). The model was evaluated on the Flores, TaCon and NTREX evaluation datasets.
30
 
31
  ## Intended uses and limitations
32
 
 
123
  We use the BLEU score for evaluation on test sets: [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200), [TaCon](https://elrc-share.eu/repository/browse/tacon-spanish-constitution-mt-test-set/84a96138b98611ec9c1a00155d02670628f3e6857b0f422abd82abc3795ec8c2/) and [NTREX](https://github.com/MicrosoftTranslator/NTREX)
124
  ### Evaluation results
125
  Below are the evaluation results on the machine translation from Galician to Catalan compared to [M2M100 1.2B](https://huggingface.co/facebook/m2m100_1.2B), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [ NLLB-200's distilled 1.3B variant](https://huggingface.co/facebook/nllb-200-distilled-1.3B):
126
+ | Test set |M2M100 1.2B| NLLB 1.3B | NLLB 3.3 |mt-aina-gl-ca|
127
  |----------------------|-------|-----------|------------------|---------------|
128
  | Flores 200 devtest |32,6| 22,3 | **34,3** | 32,4 |
129
  | TaCON |56,5|32,2 | 54,1 | **58,2** |