Fairseq
English
Catalan
carlosep93 commited on
Commit
07187fd
·
1 Parent(s): da2d3dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -24
README.md CHANGED
@@ -62,30 +62,30 @@ print(tokenizer.detokenize(translated[0][0]['tokens']))
62
 
63
  The was trained on a combination of the following datasets:
64
 
65
- | Dataset | Sentences | Tokens |
66
- |--------------------|----------------|-------------------|
67
- | Global Voices | 21.342 | 438.032 |
68
- | Memories Lluires | 1.173.055 | 9.452.382 |
69
- | Wikimatrix | 1.205.908 | 28.111.517 |
70
- | TED Talks | 50.979 | 770.774 |
71
- | Tatoeba | 5.500 | 34.872 |
72
- | CoVost 2 ca-en | 79.633 | 809.660 |
73
- | CoVost 2 en-ca | 263.891 | 2.953.096 |
74
- | Europarl | 1.965.734 | 50.417.289 |
75
- | jw300 | 97.081 | 1.809.252 |
76
- | Crawled Generalitat| 38.595 | 858.385 |
77
- | Opus Books | 4.580 | 73.416 |
78
- | CC Aligned | 5.787.682 | 89.606.874 |
79
- | COVID_Wikipedia | 1.531 | 34.836 |
80
- | EuroBooks | 3.746 | 82.067 |
81
- | Gnome | 2.183 | 30.228 |
82
- | KDE 4 | 144.153 | 1.450.631 |
83
- | OpenSubtitles | 427.913 | 2.796.350 |
84
- | QED | 69.823 | 1.058.003 |
85
- | Ubuntu | 6.781 | 33.321 |
86
- | Wikimedia | 208.073 | 5.761.409 |
87
- |--------------------|----------------|-------------------|
88
- | **Total** | **11.558.183** | **196.582.394** |
89
 
90
  ### Training procedure
91
 
 
62
 
63
  The was trained on a combination of the following datasets:
64
 
65
+ | Dataset | Sentences |
66
+ |--------------------|----------------|
67
+ | Global Voices | 21.342 |
68
+ | Memories Lluires | 1.173.055 |
69
+ | Wikimatrix | 1.205.908 |
70
+ | TED Talks | 50.979 |
71
+ | Tatoeba | 5.500 |
72
+ | CoVost 2 ca-en | 79.633 |
73
+ | CoVost 2 en-ca | 263.891 |
74
+ | Europarl | 1.965.734 |
75
+ | jw300 | 97.081 |
76
+ | Crawled Generalitat| 38.595 |
77
+ | Opus Books | 4.580 |
78
+ | CC Aligned | 5.787.682 |
79
+ | COVID_Wikipedia | 1.531 |
80
+ | EuroBooks | 3.746 |
81
+ | Gnome | 2.183 |
82
+ | KDE 4 | 144.153 |
83
+ | OpenSubtitles | 427.913 |
84
+ | QED | 69.823 |
85
+ | Ubuntu | 6.781 |
86
+ | Wikimedia | 208.073 |
87
+ |--------------------|----------------|
88
+ | **Total** | **11.558.183** |
89
 
90
  ### Training procedure
91