Update README.md
Browse files
README.md
CHANGED
@@ -140,8 +140,11 @@ Some of the statistics of the corpus:
|
|
140 |
The pretraining objective used for this architecture is next token prediction.
|
141 |
The configuration of the **GPT2-base-bne** model is as follows:
|
142 |
- gpt2-base: 12-layer, 768-hidden, 12-heads, 117M parameters.
|
|
|
143 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
|
|
|
144 |
The GPT2-base-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
|
|
|
145 |
The training lasted a total of 3 days with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
146 |
|
147 |
## Additional information
|
|
|
140 |
The pretraining objective used for this architecture is next token prediction.
|
141 |
The configuration of the **GPT2-base-bne** model is as follows:
|
142 |
- gpt2-base: 12-layer, 768-hidden, 12-heads, 117M parameters.
|
143 |
+
|
144 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
|
145 |
+
|
146 |
The GPT2-base-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
|
147 |
+
|
148 |
The training lasted a total of 3 days with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
149 |
|
150 |
## Additional information
|