NB-ROBERTA Training Code

This is the current training code for the planned nb-roberta models.

We are currently planning to run the following experiments:

Name nb-roberta-base-old (C)
Corpus NbAiLab/nb_bert
Pod size v4-64
Batch size 62*4*8 = 1984 = 2k
Learning rate 3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps 250k
Name nb-roberta-base-ext (B)
Corpus NbAiLab/nbailab_extended
Pod size v4-64
Batch size 62*4*8 = 1984 = 2k
Learning rate 3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps 250k
Name nb-roberta-large-ext
Corpus NbAiLab/nbailab_extended
Pod size v4-64
Batch size 32*4*8 = 2024 = 1k
Learning rate 2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps 500k
Name nb-roberta-base-scandi
Corpus NbAiLab/scandinavian
Pod size v4-64
Batch size 62*4*8 = 1984 = 2k
Learning rate 3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps 250k
Name nb-roberta-large-scandi
Corpus NbAiLab/scandinavian
Pod size v4-64
Batch size 32*4*8 = 1024 = 1k
Learning rate 2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps 500k

Calculations

Some basic that we used when estimating the number of training steps:

  • The Scandinavic Corpus is 85GB
  • The Scandinavic Corpus contains 13B words
  • With a conversion factor of 2.3, this is estimated to around 30B tokens
  • 30B tokens / (512 seq length * 3000 batch size) = 20.000 steps
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support