What is this?

This is a model based on the EleutherAI/gpt-neo-1.3B model containing 1.3 B parameters for Danish text generation. The model was not pre-trained from scratch but adapted from the English version using CLP-Transfer.

How to use

Test the model using the pipeline from the ๐Ÿค— Transformers library:

from transformers import pipeline

generator = pipeline("text-generation", model = "KennethTM/gpt-neo-1.3B-danish")
text = generator("Der var engang ")

print(text[0]["generated_text"])

Or load it using the Auto* classes:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("KennethTM/gpt-neo-1.3B-danish")
model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt-neo-1.3B-danish")

Model training

The training data are the Danish part of the oscar dataset ('unshuffled_deduplicated_da') which is split randomly into training (95%) and validation (5%) datasets.

The model weights are initialized from the English gpt-neo-1.3B model ('source model') with new word token embeddings created from the Danish GPT-2 small model ('helper model') using the CLP-Transfer method.

Training is done using a context window of 1024 and mixed precision (bf16). First, only the word token embeddings are trained using 0.5 M samples followed by training of all weights using approximately 2 M samples (1 epoch).

The model achieves a perplexity of 16.75 on approximately 0.1 M validation samples.

The model is trained on a 24 GB GPU.

Notes

This is a pre-trained model, for optimal performance it should be finetuned for new downstream tasks tasks.

Downloads last month
127
Safetensors
Model size
1.32B params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train KennethTM/gpt-neo-1.3B-danish

Space using KennethTM/gpt-neo-1.3B-danish 1