projecte-aina
/

aina-translator-ca-es

Fairseq

Catalan

Spanish

Model card Files Files and versions Community

carlosep93 commited on Nov 23, 2022

Commit

3d4bce7

•

1 Parent(s): 3af3027

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -10

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ license: cc-by-4.0
 ## Model description
-This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Catalan-Spanish datasets, up to 83 million sentences. Additionally, the model is evaluated on several public datasecomprising 5 different domains (general, adminstrative, technology, biomedical, and news).
 ## Intended uses and limitations
@@ -74,7 +74,6 @@ The was trained on a combination of the following datasets:
 | CCMatrix v1       | 56.103.820     | 1.064.182.320     |
 | MultiCCAligned v1 | 2.433.418      | 48.294.144        |
 | ParaCrawl         | 15.327.808     | 334.199.408       |
-|-------------------|----------------|-------------------|
 | **Total**         | **92.578.683** | **1.875.910.305** |
 ### Training procedure
@@ -122,7 +121,7 @@ The model was trained using shards of 10 million sentences, for a total of 13.00
 ### Variable and metrics
-We use the BLEU score for evaluation on test sets: [Flores-101](https://github.com/facebookresearch/flores), [Spanish Constitution](), [United Nations](), [Cybersecurity](), [wmt19 biomedical test set](), [wmt13 news test set](), [AAPP]()
 ### Evaluation results
@@ -138,16 +137,9 @@ Below are the evaluation results on the machine translation from Catalan to Chin
 | Cybersecurity        | 73,5       | **76,9**         | 75,1          |
 | wmt 19 biomedical    | 60,0       | 62,7             | **63,0**      |
 | wmt 13 news          | 22,7       | 23,1             | **23,4**      |
-|----------------------|------------|------------------|---------------|
 | Average              | 52,5       | 56,6             | **56,7**      |
-  - [Author](#author)
-  - [Licensing information](#licensing-information)
-  - [Funding](#funding)
-  - [Disclaimer](#disclaimer)
 ## Additional information
 ### Author

 ## Model description
+This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Catalan-Spanish datasets, up to 92 million sentences. Additionally, the model is evaluated on several public datasecomprising 5 different domains (general, adminstrative, technology, biomedical, and news).
 ## Intended uses and limitations
 | CCMatrix v1       | 56.103.820     | 1.064.182.320     |
 | MultiCCAligned v1 | 2.433.418      | 48.294.144        |
 | ParaCrawl         | 15.327.808     | 334.199.408       |
 | **Total**         | **92.578.683** | **1.875.910.305** |
 ### Training procedure
 ### Variable and metrics
+We use the BLEU score for evaluation on test sets: [Flores-101](https://github.com/facebookresearch/flores), [TaCon](https://elrc-share.eu/repository/browse/tacon-spanish-constitution-mt-test-set/84a96138b98611ec9c1a00155d02670628f3e6857b0f422abd82abc3795ec8c2/), [United Nations](https://zenodo.org/record/3888414#.Y33-_tLMIW0), [Cybersecurity](https://elrc-share.eu/repository/browse/cyber-mt-test-set/2bd93faab98c11ec9c1a00155d026706b96a490ed3e140f0a29a80a08c46e91e/), [wmt19 biomedical test set](), [wmt13 news test set](https://elrc-share.eu/repository/browse/catalan-wmt2013-machine-translation-shared-task-test-set/84a96139b98611ec9c1a00155d0267061a0aa1b62e2248e89aab4952f3c230fc/), [aina aapp]()
 ### Evaluation results
 | Cybersecurity        | 73,5       | **76,9**         | 75,1          |
 | wmt 19 biomedical    | 60,0       | 62,7             | **63,0**      |
 | wmt 13 news          | 22,7       | 23,1             | **23,4**      |
 | Average              | 52,5       | 56,6             | **56,7**      |
 ## Additional information
 ### Author