NB-ROBERTA Training Code
This is the current training code for the planned nb-roberta models.
We are currently planning to run the following experiments:
Name | nb-roberta-base-old (C) |
Corpus | NbAiLab/nb_bert |
Pod size | v4-64 |
Batch size | 62*4*8 = 1984 = 2k |
Learning rate | 3e-4 (RoBERTa article is using 6e-4 and bs=8k) |
Number of steps | 250k |
Name | nb-roberta-base-ext (B) |
Corpus | NbAiLab/nbailab_extended |
Pod size | v4-64 |
Batch size | 62*4*8 = 1984 = 2k |
Learning rate | 3e-4 (RoBERTa article is using 6e-4 and bs=8k) |
Number of steps | 250k |
Name | nb-roberta-large-ext |
Corpus | NbAiLab/nbailab_extended |
Pod size | v4-64 |
Batch size | 32*4*8 = 2024 = 1k |
Learning rate | 2-e4 (RoBERTa article is using 4e-4 and bs=8k) |
Number of steps | 500k |
Name | nb-roberta-base-scandi |
Corpus | NbAiLab/scandinavian |
Pod size | v4-64 |
Batch size | 62*4*8 = 1984 = 2k |
Learning rate | 3e-4 (RoBERTa article is using 6e-4 and bs=8k) |
Number of steps | 250k |
Name | nb-roberta-large-scandi |
Corpus | NbAiLab/scandinavian |
Pod size | v4-64 |
Batch size | 32*4*8 = 1024 = 1k |
Learning rate | 2-e4 (RoBERTa article is using 4e-4 and bs=8k) |
Number of steps | 500k |
Calculations
Some basic that we used when estimating the number of training steps:
- The Scandinavic Corpus is 85GB
- The Scandinavic Corpus contains 13B words
- With a conversion factor of 2.3, this is estimated to around 30B tokens
- 30B tokens / (512 seq length * 3000 batch size) = 20.000 steps
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support