File size: 2,895 Bytes
045029d
 
78f0d8e
 
 
 
 
 
 
 
 
 
045029d
78f0d8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---

license: mit
language:
- en
- eu
metrics:
- BLEU
- TER
tags:
- text2text-generation
- open-nmt
- pytorch
---


# Itzune v1.9 EN -> EU machine translation argos model

This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).

## Model description


- **Developed by:** argostranslate
- **Model type:** traslation
- **Model version:** v1.9
- **Source Language:** English
- **Target Language:** Basque
- **License:** MIT

## Training Data

The English-Basque parallel sentences were collected from the following datasets:

| Dataset       	   | Sentences before cleaning |
|----------------------|--------------------------:|
| CCMatrix v1          | 7,788,871  	           |
| OpenSubtitles	v2018  | 805,780                   |
| XLEnt v1.2           | 800,631                   |
| GNOME v1             | 652,298                   |
| HPLT v1.1            | 610,694                   |
| EhuHac v1            | 585,210	               |
| WikiMatrix v1	       | 119,480                   |
| KDE4 v2              | 100,160                   |
| wikimedia v20230407  | 60,990                    |
| bible-uedin v1       | 15,893                    |
| Tatoeba v2023-04-12  | 2,070                     |
| Wiktionary           | 629                       |
| **Total**     	   | **11,542,706**            |

### Evaluation results
Below are the evaluation results on the machine translation from English to Basque compared to [Google Translate](https://translate.google.com/), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [mt-hitz-en-eu](https://huggingface.co/HiTZ/mt-hitz-en-eu):

#### BLEU scores

| Test set             |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
|----------------------|-----------------|----------|---------------|------------|
| Flores 200 devtest   | **20.5**        | 13.3     |   19.2        | 17.0       |
| TaCON                | **12.1**        |  9.4     |   8.8         | -          |
| NTREX                | **15.7**        |  8.0     |   14.5        | -          |
| Average              | **16.1**        | 10.2     |   14.2        | -          |

#### TER scores

| Test set             |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
|----------------------|-----------------|----------|---------------|------------|
| Flores 200 devtest   |**59.5**         | 70.4     |   65.0        | 70.1       |
| TaCON                |**69.5**         | 75.3     |   76.8        | -          |
| NTREX                |**65.8**         | 81.6     |   66.7        | -          |
| Average              |**64.9**         | 75.8     |   68.2        | -          |