--- tags: - pytorch - causal-lm metrics: - accuracy language: - sl license: apache-2.0 --- # GPT-sl-base This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu. ## Model architecture GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens. ## Training The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training. | Step | Validation Perplexity | |:------:|:---------------------:| | 50000 | 26.801 | | 100000 | 25.574 | | 150000 | 24.773 | | 200000 | 24.099 | | 250000 | 23.336 | | 300000 | 22.607 | | 350000 | 22.329 | | 390000 | 22.293 |