nllb-deu-moo-v2 / README.md
CmdCody's picture
Update README.md
7974d00 verified
|
raw
history blame
2.64 kB
metadata
library_name: transformers
license: cc-by-nc-4.0
language:
  - de
  - frr
pipeline_tag: translation
base_model: facebook/nllb-200-distilled-600M

Model Card for nllb-deu-moo-v2

This is an NLLB-200-600M model fine-tuned for translating between German and the Northern Frisian dialect Mooring following this great blogpost.

Model Details

Model Description

  • Language(s) (NLP): Northern Frisian, German
  • License: Commons Attribution Non Commercial 4.0
  • Finetuned from model: NLLB-200-600M

How to Get Started with the Model

How to use the model:

!pip install transformers>=4.38

tokenizer = NllbTokenizer.from_pretrained("CmdCody/nllb-deu-moo-v2")
model = AutoModelForSeq2SeqLM.from_pretrained("CmdCody/nllb-deu-moo-v2")
model.cuda()

def translate(text, tokenizer, model, src_lang='frr_Latn', tgt_lang='deu_Latn', a=32, b=3, max_input_length=1024, num_beams=4, **kwargs):
    tokenizer.src_lang = src_lang
    tokenizer.tgt_lang = tgt_lang
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
    result = model.generate(
        **inputs.to(model.device),
        forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
        max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
        num_beams=num_beams,
        **kwargs
    )
    return tokenizer.batch_decode(result, skip_special_tokens=True)

translate("Ik boog önj Naibel." tokenizer=tokenizer, model=model)

Training Details

Training Data

The training data consists of "Rüm Hart" published by the Nordfriisk Instituut. It was split and cleaned up, partially manually, resulting in 5178 example sentences.

Training Procedure

The training loop was implemented as described in this article. The model was trained for 5 epochs of 1000 steps each using a batch size of 16 using a Google GPU via a Colab notebook. Each epoch took roughly 30 minutes to train.

The BLEU score was calculated on a set of 177 sentences taken from other sources.

Metrics

Epochs Steps BLEU Score frr -> de BLEU Score de -> frr
1 1000 35.86 35.68
2 2000 40.76 42.25
3 3000 42.18 46.48
4 4000 41.01 45.15
5 5000 44.74 47.48