urtzai's picture
README file updated
78f0d8e
---
license: mit
language:
- en
- eu
metrics:
- BLEU
- TER
tags:
- text2text-generation
- open-nmt
- pytorch
---
# Itzune v1.9 EN -> EU machine translation argos model
This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).
## Model description
- **Developed by:** argostranslate
- **Model type:** traslation
- **Model version:** v1.9
- **Source Language:** English
- **Target Language:** Basque
- **License:** MIT
## Training Data
The English-Basque parallel sentences were collected from the following datasets:
| Dataset | Sentences before cleaning |
|----------------------|--------------------------:|
| CCMatrix v1 | 7,788,871 |
| OpenSubtitles v2018 | 805,780 |
| XLEnt v1.2 | 800,631 |
| GNOME v1 | 652,298 |
| HPLT v1.1 | 610,694 |
| EhuHac v1 | 585,210 |
| WikiMatrix v1 | 119,480 |
| KDE4 v2 | 100,160 |
| wikimedia v20230407 | 60,990 |
| bible-uedin v1 | 15,893 |
| Tatoeba v2023-04-12 | 2,070 |
| Wiktionary | 629 |
| **Total** | **11,542,706** |
### Evaluation results
Below are the evaluation results on the machine translation from English to Basque compared to [Google Translate](https://translate.google.com/), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [mt-hitz-en-eu](https://huggingface.co/HiTZ/mt-hitz-en-eu):
#### BLEU scores
| Test set |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
|----------------------|-----------------|----------|---------------|------------|
| Flores 200 devtest | **20.5** | 13.3 | 19.2 | 17.0 |
| TaCON | **12.1** | 9.4 | 8.8 | - |
| NTREX | **15.7** | 8.0 | 14.5 | - |
| Average | **16.1** | 10.2 | 14.2 | - |
#### TER scores
| Test set |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
|----------------------|-----------------|----------|---------------|------------|
| Flores 200 devtest |**59.5** | 70.4 | 65.0 | 70.1 |
| TaCON |**69.5** | 75.3 | 76.8 | - |
| NTREX |**65.8** | 81.6 | 66.7 | - |
| Average |**64.9** | 75.8 | 68.2 | - |