mmarimon commited on
Commit
4181836
·
1 Parent(s): 8323e27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -140,8 +140,11 @@ Some of the statistics of the corpus:
140
  The pretraining objective used for this architecture is next token prediction.
141
  The configuration of the **GPT2-base-bne** model is as follows:
142
  - gpt2-base: 12-layer, 768-hidden, 12-heads, 117M parameters.
 
143
  The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
 
144
  The GPT2-base-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
 
145
  The training lasted a total of 3 days with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
146
 
147
  ## Additional information
 
140
  The pretraining objective used for this architecture is next token prediction.
141
  The configuration of the **GPT2-base-bne** model is as follows:
142
  - gpt2-base: 12-layer, 768-hidden, 12-heads, 117M parameters.
143
+
144
  The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
145
+
146
  The GPT2-base-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
147
+
148
  The training lasted a total of 3 days with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
149
 
150
  ## Additional information