Update README.md
Browse files
README.md
CHANGED
@@ -37,10 +37,13 @@ model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt2-small-danish")
|
|
37 |
|
38 |
The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
|
39 |
|
40 |
-
The model
|
41 |
|
42 |
Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.
|
43 |
|
|
|
|
|
|
|
44 |
Model training is carried out on an 8 GB GPU.
|
45 |
|
46 |
# Notes
|
|
|
37 |
|
38 |
The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
|
39 |
|
40 |
+
The model weights are initialized from the English [GPT-2 small model](https://huggingface.co/gpt2) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).
|
41 |
|
42 |
Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.
|
43 |
|
44 |
+
For reference, the model achieves a perplexity of 33.5 on 5.000 random validation samples.
|
45 |
+
|
46 |
+
|
47 |
Model training is carried out on an 8 GB GPU.
|
48 |
|
49 |
# Notes
|