PlanTL-GOB-ES
/

gpt2-base-bne

Text Generation

national library of spain

text-generation-inference

Model card Files Files and versions Community

mmarimon commited on Nov 15, 2022

Commit

4181836

·

1 Parent(s): 8323e27

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -140,8 +140,11 @@ Some of the statistics of the corpus:
 The pretraining objective used for this architecture is next token prediction.
 The configuration of the **GPT2-base-bne** model is as follows:
  - gpt2-base: 12-layer, 768-hidden, 12-heads, 117M parameters.
 The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
 The GPT2-base-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
 The training lasted a total of 3 days with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 ## Additional information

 The pretraining objective used for this architecture is next token prediction.
 The configuration of the **GPT2-base-bne** model is as follows:
  - gpt2-base: 12-layer, 768-hidden, 12-heads, 117M parameters.
 The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model with a vocabulary size of 50,262 tokens.
 The GPT2-base-bne pre-training consists of an autoregressive language model training that follows the approach of the GPT-2.
 The training lasted a total of 3 days with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 ## Additional information