bertin-project
/

bertin-roberta-base-spanish

@@ -148,6 +148,7 @@ Our final models were trained on a different number of steps and sequence length
 <figure>
 | Dataset     | Metric   | RoBERTa-b | RoBERTa-l | BETO   | mBERT  | BERTIN |
 |-------------|----------|-----------|-----------|--------|--------|--------|
 | UD-POS      | F1       |    **0.9907** |    0.9901 | 0.9900 | 0.9886 | **0.9904** |
@@ -159,14 +160,13 @@ Our final models were trained on a different number of steps and sequence length
 | PAWS-X      | F1       |    0.9035 |    0.9000 | 0.8915 | 0.9020 | 0.8820 |
 | XNLI        | Accuracy |    0.8016 |       WiP | 0.8130 | 0.7876 |    WiP |
-<caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
 </figure>
 All of our models attained good accuracy values, in the range of 0.65, as can be seen in Table 2:
 <figure>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bertin-project/bertin-roberta-base-spanish         | 0.6547   |
@@ -176,8 +176,6 @@ All of our models attained good accuracy values, in the range of 0.65, as can be
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.5907   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | **0.6873**   |
-<caption>Table 2. Accuracy for the different language models.</caption>
 </figure>
 We are currently in the process of applying our language models to downstream tasks.
@@ -192,6 +190,7 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
 <figure>
 | Model                                              |    F1    | Accuracy |
 |----------------------------------------------------|----------|----------|
 | bert-base-multilingual-cased                       | 0.9629   | 0.9687   |
@@ -204,8 +203,6 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.9660   | 0.9707   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | **0.9662**   | **0.9714**   |
-<caption>Table 3. Results for POS.</caption>
 </figure>
@@ -214,6 +211,7 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
 <figure>
 | Model                                              |    F1    | Accuracy |
 |----------------------------------------------------|----------|----------|
 | bert-base-multilingual-cased                       | 0.8539   | 0.9779   |
@@ -226,8 +224,6 @@ All models trained with max length 512 and batch size 8, using the CoNLL 2002 da
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.8616   | 0.9803   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | **0.8764**   |  **0.9819**   |
-<caption>Table 4. Results for NER.</caption>
 </figure>
@@ -236,6 +232,7 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
 <figure>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bert-base-multilingual-cased                       | 0.5765   |
@@ -248,8 +245,6 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.6735   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  |  **0.8965**   |
-<caption>Table 5. Results for PAWS-X.</caption>
 </figure>
@@ -257,6 +252,7 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
 <figure>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bert-base-multilingual-cased                       | 0.7852   |
@@ -268,13 +264,14 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.7723   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | 0.7878   |
-<caption>Table 6. Results for XNLI with sequence length 256 and batch size 32.</caption>
 </figure>
 <figure>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bert-base-multilingual-cased                       | WIP   |
@@ -287,9 +284,6 @@ All models trained with max length 512 and batch size 8. These numbers are surpr
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | 0.7843   |
-<caption>Table 7. Results for XNLI with sequence length 512 and batch size 16.</caption>
-</figure>
 # Conclusions
 With roughly 10 days worth of access to 3xTPUv3-8, we have achieved remarkable results surpassing previous state of the art in a few tasks, and even improving document classification on models trained in massive supercomputers with very large—private—and highly curated datasets.

 <figure>
+<caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
 | Dataset     | Metric   | RoBERTa-b | RoBERTa-l | BETO   | mBERT  | BERTIN |
 |-------------|----------|-----------|-----------|--------|--------|--------|
 | UD-POS      | F1       |    **0.9907** |    0.9901 | 0.9900 | 0.9886 | **0.9904** |
 | PAWS-X      | F1       |    0.9035 |    0.9000 | 0.8915 | 0.9020 | 0.8820 |
 | XNLI        | Accuracy |    0.8016 |       WiP | 0.8130 | 0.7876 |    WiP |
 </figure>
 All of our models attained good accuracy values, in the range of 0.65, as can be seen in Table 2:
 <figure>
+<caption>Table 2. Accuracy for the different language models.</caption>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bertin-project/bertin-roberta-base-spanish         | 0.6547   |
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.5907   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | **0.6873**   |
 </figure>
 We are currently in the process of applying our language models to downstream tasks.
 <figure>
+<caption>Table 3. Results for POS.</caption>
 | Model                                              |    F1    | Accuracy |
 |----------------------------------------------------|----------|----------|
 | bert-base-multilingual-cased                       | 0.9629   | 0.9687   |
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.9660   | 0.9707   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | **0.9662**   | **0.9714**   |
 </figure>
 <figure>
+<caption>Table 4. Results for NER.</caption>
 | Model                                              |    F1    | Accuracy |
 |----------------------------------------------------|----------|----------|
 | bert-base-multilingual-cased                       | 0.8539   | 0.9779   |
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.8616   | 0.9803   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | **0.8764**   |  **0.9819**   |
 </figure>
 <figure>
+<caption>Table 5. Results for PAWS-X.</caption>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bert-base-multilingual-cased                       | 0.5765   |
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.6735   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  |  **0.8965**   |
 </figure>
 <figure>
+<caption>Table 6. Results for XNLI with sequence length 256 and batch size 32.</caption>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bert-base-multilingual-cased                       | 0.7852   |
 | bertin-project/bertin-base-random-exp-512seqlen    | 0.7723   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | 0.7878   |
 </figure>
 <figure>
+<caption>Table 7. Results for XNLI with sequence length 512 and batch size 16.</caption>
+</figure>
 | Model                                              | Accuracy |
 |----------------------------------------------------|----------|
 | bert-base-multilingual-cased                       | WIP   |
 | bertin-project/bertin-base-gaussian-exp-512seqlen  | 0.7843   |
 # Conclusions
 With roughly 10 days worth of access to 3xTPUv3-8, we have achieved remarkable results surpassing previous state of the art in a few tasks, and even improving document classification on models trained in massive supercomputers with very large—private—and highly curated datasets.