Commit
·
07187fd
1
Parent(s):
da2d3dd
Update README.md
Browse files
README.md
CHANGED
@@ -62,30 +62,30 @@ print(tokenizer.detokenize(translated[0][0]['tokens']))
|
|
62 |
|
63 |
The was trained on a combination of the following datasets:
|
64 |
|
65 |
-
| Dataset | Sentences |
|
66 |
-
|
67 |
-
| Global Voices | 21.342 |
|
68 |
-
| Memories Lluires | 1.173.055 |
|
69 |
-
| Wikimatrix | 1.205.908 |
|
70 |
-
| TED Talks | 50.979 |
|
71 |
-
| Tatoeba | 5.500 |
|
72 |
-
| CoVost 2 ca-en | 79.633 |
|
73 |
-
| CoVost 2 en-ca | 263.891 |
|
74 |
-
| Europarl | 1.965.734 |
|
75 |
-
| jw300 | 97.081 |
|
76 |
-
| Crawled Generalitat| 38.595 |
|
77 |
-
| Opus Books | 4.580 |
|
78 |
-
| CC Aligned | 5.787.682 |
|
79 |
-
| COVID_Wikipedia | 1.531 |
|
80 |
-
| EuroBooks | 3.746 |
|
81 |
-
| Gnome | 2.183 |
|
82 |
-
| KDE 4 | 144.153 |
|
83 |
-
| OpenSubtitles | 427.913 |
|
84 |
-
| QED | 69.823 |
|
85 |
-
| Ubuntu | 6.781 |
|
86 |
-
| Wikimedia | 208.073 |
|
87 |
-
|
88 |
-
| **Total** | **11.558.183** |
|
89 |
|
90 |
### Training procedure
|
91 |
|
|
|
62 |
|
63 |
The was trained on a combination of the following datasets:
|
64 |
|
65 |
+
| Dataset | Sentences |
|
66 |
+
|--------------------|----------------|
|
67 |
+
| Global Voices | 21.342 |
|
68 |
+
| Memories Lluires | 1.173.055 |
|
69 |
+
| Wikimatrix | 1.205.908 |
|
70 |
+
| TED Talks | 50.979 |
|
71 |
+
| Tatoeba | 5.500 |
|
72 |
+
| CoVost 2 ca-en | 79.633 |
|
73 |
+
| CoVost 2 en-ca | 263.891 |
|
74 |
+
| Europarl | 1.965.734 |
|
75 |
+
| jw300 | 97.081 |
|
76 |
+
| Crawled Generalitat| 38.595 |
|
77 |
+
| Opus Books | 4.580 |
|
78 |
+
| CC Aligned | 5.787.682 |
|
79 |
+
| COVID_Wikipedia | 1.531 |
|
80 |
+
| EuroBooks | 3.746 |
|
81 |
+
| Gnome | 2.183 |
|
82 |
+
| KDE 4 | 144.153 |
|
83 |
+
| OpenSubtitles | 427.913 |
|
84 |
+
| QED | 69.823 |
|
85 |
+
| Ubuntu | 6.781 |
|
86 |
+
| Wikimedia | 208.073 |
|
87 |
+
|--------------------|----------------|
|
88 |
+
| **Total** | **11.558.183** |
|
89 |
|
90 |
### Training procedure
|
91 |
|