metadata
license: apache-2.0
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language:
- en
- de
- fr
- multilingual
datasets:
- may-ohta/iwslt14
metrics:
- bleu
JoeyNMT: iwslt14 de-en-fr multilingual
This is a JoeyNMT model for multilingual MT with language tags, built for a demo purpose. The model is trained on iwslt14 de-en / en-fr parallel data using DDP.
Install JoeyNMT v2.3:
$ pip install git+https://github.com/joeynmt/joeynmt.git
Translation
Torch hub interface:
import torch
iwslt14 = torch.hub.load("joeynmt/joeynmt", "iwslt14_prompt")
translation = iwslt14.translate(
src=["Hello world!"], # src sentence
src_prompt=["<en>"], # src language code
trg_prompt=["<de>"], # trg language code
beam_size=1,
)
print(translation) # ["Hallo Welt!"]
(See jupyter notebook for details)
Training
$ python -m joeynmt train iwslt14_prompt/config.yaml --use-ddp --skip-test
(See train.log
for details)
Evaluation
$ git clone https://huggingface.co/may-ohta/iwslt14_prompt
$ python -m joeynmt test iwslt14_prompt/config.yaml --output-path iwslt14_prompt/hyp
direction | bleu |
---|---|
en->de | 28.88 |
de->en | 35.28 |
en->fr | 38.86 |
fr->en | 40.35 |
- beam_size: 5
- beam_alpha: 1.0
- sacrebleu signature
nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0
(See test.log
for details)
Data Format
We downloaded IWSLT14 de-en and en-fr from https://wit3.fbk.eu/2014-01 and created {train|dev|test}.tsv
files in the following format:
src_prompt | src | trg_prompt | trg |
---|---|---|---|
<en> |
Hello. | <de> |
Hallo. |
<de> |
Vielen Dank! | <en> |
Thank you! |
(See test.ref.de-en.tsv
)