Fairseq
Catalan
Spanish
carlosep93 commited on
Commit
3d4bce7
1 Parent(s): 3af3027

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -10
README.md CHANGED
@@ -26,7 +26,7 @@ license: cc-by-4.0
26
 
27
  ## Model description
28
 
29
- This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Catalan-Spanish datasets, up to 83 million sentences. Additionally, the model is evaluated on several public datasecomprising 5 different domains (general, adminstrative, technology, biomedical, and news).
30
 
31
  ## Intended uses and limitations
32
 
@@ -74,7 +74,6 @@ The was trained on a combination of the following datasets:
74
  | CCMatrix v1 | 56.103.820 | 1.064.182.320 |
75
  | MultiCCAligned v1 | 2.433.418 | 48.294.144 |
76
  | ParaCrawl | 15.327.808 | 334.199.408 |
77
- |-------------------|----------------|-------------------|
78
  | **Total** | **92.578.683** | **1.875.910.305** |
79
 
80
  ### Training procedure
@@ -122,7 +121,7 @@ The model was trained using shards of 10 million sentences, for a total of 13.00
122
 
123
  ### Variable and metrics
124
 
125
- We use the BLEU score for evaluation on test sets: [Flores-101](https://github.com/facebookresearch/flores), [Spanish Constitution](), [United Nations](), [Cybersecurity](), [wmt19 biomedical test set](), [wmt13 news test set](), [AAPP]()
126
 
127
  ### Evaluation results
128
 
@@ -138,16 +137,9 @@ Below are the evaluation results on the machine translation from Catalan to Chin
138
  | Cybersecurity | 73,5 | **76,9** | 75,1 |
139
  | wmt 19 biomedical | 60,0 | 62,7 | **63,0** |
140
  | wmt 13 news | 22,7 | 23,1 | **23,4** |
141
- |----------------------|------------|------------------|---------------|
142
  | Average | 52,5 | 56,6 | **56,7** |
143
 
144
 
145
-
146
- - [Author](#author)
147
- - [Licensing information](#licensing-information)
148
- - [Funding](#funding)
149
- - [Disclaimer](#disclaimer)
150
-
151
  ## Additional information
152
 
153
  ### Author
 
26
 
27
  ## Model description
28
 
29
+ This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Catalan-Spanish datasets, up to 92 million sentences. Additionally, the model is evaluated on several public datasecomprising 5 different domains (general, adminstrative, technology, biomedical, and news).
30
 
31
  ## Intended uses and limitations
32
 
 
74
  | CCMatrix v1 | 56.103.820 | 1.064.182.320 |
75
  | MultiCCAligned v1 | 2.433.418 | 48.294.144 |
76
  | ParaCrawl | 15.327.808 | 334.199.408 |
 
77
  | **Total** | **92.578.683** | **1.875.910.305** |
78
 
79
  ### Training procedure
 
121
 
122
  ### Variable and metrics
123
 
124
+ We use the BLEU score for evaluation on test sets: [Flores-101](https://github.com/facebookresearch/flores), [TaCon](https://elrc-share.eu/repository/browse/tacon-spanish-constitution-mt-test-set/84a96138b98611ec9c1a00155d02670628f3e6857b0f422abd82abc3795ec8c2/), [United Nations](https://zenodo.org/record/3888414#.Y33-_tLMIW0), [Cybersecurity](https://elrc-share.eu/repository/browse/cyber-mt-test-set/2bd93faab98c11ec9c1a00155d026706b96a490ed3e140f0a29a80a08c46e91e/), [wmt19 biomedical test set](), [wmt13 news test set](https://elrc-share.eu/repository/browse/catalan-wmt2013-machine-translation-shared-task-test-set/84a96139b98611ec9c1a00155d0267061a0aa1b62e2248e89aab4952f3c230fc/), [aina aapp]()
125
 
126
  ### Evaluation results
127
 
 
137
  | Cybersecurity | 73,5 | **76,9** | 75,1 |
138
  | wmt 19 biomedical | 60,0 | 62,7 | **63,0** |
139
  | wmt 13 news | 22,7 | 23,1 | **23,4** |
 
140
  | Average | 52,5 | 56,6 | **56,7** |
141
 
142
 
 
 
 
 
 
 
143
  ## Additional information
144
 
145
  ### Author