matejulcar commited on
Commit
d397d1a
1 Parent(s): 9b9a502

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,3 +1,32 @@
1
  ---
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - pytorch
4
+ - causal-lm
5
+ metrics:
6
+ - accuracy
7
+ language:
8
+ - sl
9
  license: apache-2.0
10
  ---
11
+
12
+ # GPT-sl-base
13
+
14
+ This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.
15
+
16
+ ## Model architecture
17
+ GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length.
18
+ The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.
19
+
20
+ ## Training
21
+ The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.
22
+
23
+ | Step | Validation Perplexity |
24
+ |:------:|:---------------------:|
25
+ | 50000 | 26.801 |
26
+ | 100000 | 25.574 |
27
+ | 150000 | 24.773 |
28
+ | 200000 | 24.099 |
29
+ | 250000 | 23.336 |
30
+ | 300000 | 22.607 |
31
+ | 350000 | 22.329 |
32
+ | 390000 | 22.293 |