gonzalez-agirre
commited on
Commit
•
a94e933
1
Parent(s):
5110c34
Update README.md
Browse files
README.md
CHANGED
@@ -174,7 +174,7 @@ The dataset has the following language distribution:
|
|
174 |
|
175 |
## Training procedure
|
176 |
|
177 |
-
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,
|
178 |
|
179 |
|
180 |
### Training hyperparameters
|
|
|
174 |
|
175 |
## Training procedure
|
176 |
|
177 |
+
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2) used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,257 tokens. Once the model has been successfully initialized, we continued its pre-training in the three target languages: Catalan, Spanish, and English. We kept a small amount of English in order to avoid catastrophic forgetting. The training lasted a total of 96 hours with 8 NVIDIA H100 GPUs of 80GB of RAM.
|
178 |
|
179 |
|
180 |
### Training hyperparameters
|