---
tags:
- pytorch
- causal-lm
metrics:
- accuracy
language:
- sl
license: apache-2.0
---

# GPT-sl-base

This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.

## Model architecture
GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length.
The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.

## Training
The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.

|  Step  | Validation Perplexity |
|:------:|:---------------------:|
|  50000 | 26.801                |
| 100000 | 25.574                |
| 150000 | 24.773                |
| 200000 | 24.099                |
| 250000 | 23.336                |
| 300000 | 22.607                |
| 350000 | 22.329                |
| 390000 | 22.293                |