metadata
metrics:
- sacrebleu
language:
- en
- th
NLLB 600M TH-EN finetuned
This model is finetuned from facebook/nllb-200-distilled-600M using SCB-1M and OPUS dataset.
The finetuning script is on GitHub.
View full finetuning logs on wandb.
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
MODEL_NAME = "wtarit/nllb-600M-th-en"
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
device = 0 if torch.cuda.is_available() else "cpu"
translation_pipeline = pipeline(
"translation",
model=model,
tokenizer=tokenizer,
src_lang="tha_Thai",
tgt_lang="eng_Latn",
max_length=400,
device=device
)
# Run translation pipeline
result = translation_pipeline("สวัสดี เราคือโมเดลแปลภาษา")
print(result[0]['translation_text'])
Score
BLEU Score (Using sacrebleu): 27.37 on IWSLT 2015