gonzalez-agirre
commited on
Commit
•
d3a80d2
1
Parent(s):
ea92527
Update README.md
Browse files
README.md
CHANGED
@@ -196,7 +196,7 @@ The dataset has the following language distribution:
|
|
196 |
|
197 |
## Training procedure
|
198 |
|
199 |
-
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,257 tokens. Once the model has been successfully initialized, we continued its pre-training in the three target languages: Catalan, Spanish, and English. We kept a small amount of English in order to avoid catastrophic forgetting. The training lasted a total of
|
200 |
|
201 |
|
202 |
### Training hyperparameters
|
|
|
196 |
|
197 |
## Training procedure
|
198 |
|
199 |
+
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,257 tokens. Once the model has been successfully initialized, we continued its pre-training in the three target languages: Catalan, Spanish, and English. We kept a small amount of English in order to avoid catastrophic forgetting. The training lasted a total of 320 hours with 8 NVIDIA H100 GPUs of 80GB of RAM.
|
200 |
|
201 |
|
202 |
### Training hyperparameters
|