|
--- |
|
license: apache-2.0 |
|
library_name: JoeyNMT |
|
task: Machine-translation |
|
tags: |
|
- JoeyNMT |
|
- Machine-translation |
|
language: |
|
- en |
|
- de |
|
- fr |
|
- multilingual |
|
datasets: |
|
- may-ohta/iwslt14 |
|
metrics: |
|
- bleu |
|
--- |
|
# JoeyNMT: iwslt14 de-en-fr multilingual |
|
|
|
This is a JoeyNMT model for multilingual MT with language tags, built for a demo purpose. |
|
The model is trained on iwslt14 de-en / en-fr parallel data using DDP. |
|
|
|
|
|
Install [JoeyNMT](https://github.com/joeynmt/joeynmt) v2.3: |
|
``` |
|
$ pip install git+https://github.com/joeynmt/joeynmt.git |
|
``` |
|
|
|
|
|
## Translation |
|
|
|
Torch hub interface: |
|
```python |
|
import torch |
|
|
|
iwslt14 = torch.hub.load("joeynmt/joeynmt", "iwslt14_prompt") |
|
translation = iwslt14.translate( |
|
src=["Hello world!"], # src sentence |
|
src_prompt=["<en>"], # src language code |
|
trg_prompt=["<de>"], # trg language code |
|
beam_size=1, |
|
) |
|
print(translation) # ["Hallo Welt!"] |
|
``` |
|
(See [jupyter notebook](https://github.com/joeynmt/joeynmt/blob/main/notebooks/torchhub.ipynb) for details) |
|
|
|
|
|
## Training |
|
``` |
|
$ python -m joeynmt train iwslt14_prompt/config.yaml --use-ddp --skip-test |
|
``` |
|
(See `train.log` for details) |
|
|
|
|
|
## Evaluation |
|
``` |
|
$ git clone https://huggingface.co/may-ohta/iwslt14_prompt |
|
$ python -m joeynmt test iwslt14_prompt/config.yaml --output-path iwslt14_prompt/hyp |
|
``` |
|
|
|
direction | bleu |
|
--------- | :---- |
|
en->de | 28.88 |
|
de->en | 35.28 |
|
en->fr | 38.86 |
|
fr->en | 40.35 |
|
|
|
- beam_size: 5 |
|
- beam_alpha: 1.0 |
|
- sacrebleu signature `nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0` |
|
|
|
(See `test.log` for details) |
|
|
|
|
|
## Data Format |
|
We downloaded IWSLT14 de-en and en-fr from [https://wit3.fbk.eu/2014-01](https://wit3.fbk.eu/2014-01) and created `{train|dev|test}.tsv` files in the following format: |
|
|
|
|src_prompt|src|trg_prompt|trg| |
|
|:---------|:--|:---------|:--| |
|
|`<en>`|Hello.|`<de>`|Hallo.| |
|
|`<de>`|Vielen Dank!|`<en>`|Thank you!| |
|
|
|
(See `test.ref.de-en.tsv`) |
|
|