carlosep93
commited on
Commit
•
51076bc
1
Parent(s):
3d4bce7
Update README.md
Browse files
README.md
CHANGED
@@ -87,7 +87,7 @@ The was trained on a combination of the following datasets:
|
|
87 |
|
88 |
#### Tokenization
|
89 |
|
90 |
-
All data is tokenized using sentencepiece,
|
91 |
|
92 |
#### Hyperparameters
|
93 |
|
@@ -130,7 +130,7 @@ Below are the evaluation results on the machine translation from Catalan to Chin
|
|
130 |
| Test set | SoftCatalà | Google Translate | mt-aina-ca-es |
|
131 |
|----------------------|------------|------------------|---------------|
|
132 |
| Spanish Constitution | 66,2 | **77,1** | 75,5 |
|
133 |
-
| United Nations | 72
|
134 |
| aina_aapp | 78,1 | 80,8 | **81,8** |
|
135 |
| Flores 101 dev | 23,8 | 24 | **24,1** |
|
136 |
| Flores 101 devtest | 23,9 | 24,2 | **24,4** |
|
|
|
87 |
|
88 |
#### Tokenization
|
89 |
|
90 |
+
All data is tokenized using sentencepiece, with 50 thousand token sentencepiece model learned from the combination of all filtered training data. This model is included.
|
91 |
|
92 |
#### Hyperparameters
|
93 |
|
|
|
130 |
| Test set | SoftCatalà | Google Translate | mt-aina-ca-es |
|
131 |
|----------------------|------------|------------------|---------------|
|
132 |
| Spanish Constitution | 66,2 | **77,1** | 75,5 |
|
133 |
+
| United Nations | 72,0 | 84,3 | **86,3** |
|
134 |
| aina_aapp | 78,1 | 80,8 | **81,8** |
|
135 |
| Flores 101 dev | 23,8 | 24 | **24,1** |
|
136 |
| Flores 101 devtest | 23,9 | 24,2 | **24,4** |
|