Edit model card

EuroGPT2

NOTE: THIS IS THE ORIGINAL MEGATRON-DEEPSPEED CHECKPOINT INCLUDING OPTIMIZER STATES

A GPT2 language model for European languages (EU-24 + Ukrainian). The model follows the original architecture as OpenAI's GPT2 apart from using rotary instead of learned positional embeddigs.

Model settings

  • parameters: 124M
  • number of layers: 12
  • hidden size: 768
  • number of heads: 12
  • sequence length: 1024
  • batch size: 168
  • test PPL after training: 23.6 (steps: 436,940)

Training data

Languages

Included languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Croatian, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish, and Ukrainian.

Language Ratio
bg 5,92%
cs 4,77%
da 2,19%
de 7,36%
el 8,60%
en 10,11%
es 6,57%
et 1,67%
fi 2,70%
fr 7,18%
ga 0,25%
hr 1,09%
hu 6,38%
it 5,80%
lt 2,01%
lv 1,76%
mt 1,49%
nl 5,20%
pl 4,82%
pt 4,64%
ro 2,93%
sk 2,03%
sl 1,54%
sv 3,00%

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .