KennethTM
/

gpt2-small-danish

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

KennethTM commited on Jul 11, 2023

Commit

b87a7d4

·

1 Parent(s): e91704f

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -37,10 +37,13 @@ model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt2-small-danish")
 The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
-The model is initialized from the English [GPT-2 small model](https://huggingface.co/gpt2) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).
 Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.
 Model training is carried out on an 8 GB GPU.
 # Notes

 The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
+The model weights are initialized from the English [GPT-2 small model](https://huggingface.co/gpt2) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).
 Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.
+For reference, the model achieves a perplexity of 33.5 on 5.000 random validation samples.
 Model training is carried out on an 8 GB GPU.
 # Notes