What is this?

A pre-trained BERT model (base version, ~110 M parameters) for Danish NLP. The model was not pre-trained from scratch but adapted from the English version with a tokenizer trained on Danish text.

How to use

Test the model using the pipeline from the 🤗 Transformers library:

from transformers import pipeline

pipe = pipeline("fill-mask", model="KennethTM/bert-base-uncased-danish")

pipe("Der var engang en [MASK]")

Or load it using the Auto* classes:

# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("KennethTM/bert-base-uncased-danish")
model = AutoModelForMaskedLM.from_pretrained("KennethTM/bert-base-uncased-danish")

Model training

The model is trained using multiple Danish datasets and a context length of 512 tokens.

The model weights are initialized from the English bert-base-uncased model with new word token embeddings created for Danish using WECHSEL.

Initially, only the word token embeddings are trained using 1.000.000 samples. Finally, the whole model is trained for 8 epochs.

Evaluation

The performance of the pretrained model was evaluated using ScandEval.

Task Dataset Score (±SE)
sentiment-classification swerec mcc = 63.02 (±2.16)
macro_f1 = 62.2 (±3.61)
sentiment-classification angry-tweets mcc = 47.21 (±0.53)
macro_f1 = 64.21 (±0.53)
sentiment-classification norec mcc = 42.23 (±8.69)
macro_f1 = 57.24 (±7.67)
named-entity-recognition suc3 micro_f1 = 50.03 (±4.16)
micro_f1_no_misc = 53.55 (±4.57)
named-entity-recognition dane micro_f1 = 76.44 (±1.36)
micro_f1_no_misc = 80.61 (±1.11)
named-entity-recognition norne-nb micro_f1 = 68.38 (±1.72)
micro_f1_no_misc = 73.08 (±1.66)
named-entity-recognition norne-nn micro_f1 = 60.45 (±1.71)
micro_f1_no_misc = 64.39 (±1.8)
linguistic-acceptability scala-sv mcc = 5.01 (±5.41)
macro_f1 = 49.46 (±3.67)
linguistic-acceptability scala-da mcc = 54.74 (±12.22)
macro_f1 = 76.25 (±6.09)
linguistic-acceptability scala-nb mcc = 19.18 (±14.01)
macro_f1 = 55.3 (±8.85)
linguistic-acceptability scala-nn mcc = 5.72 (±5.91)
macro_f1 = 49.56 (±3.73)
question-answering scandiqa-da em = 26.36 (±1.17)
f1 = 32.41 (±1.1)
question-answering scandiqa-no em = 26.14 (±1.59)
f1 = 32.02 (±1.59)
question-answering scandiqa-sv em = 26.38 (±1.1)
f1 = 32.33 (±1.05)
speed speed speed = 4.55 (±0.0)
Downloads last month
36
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train KennethTM/bert-base-uncased-danish