iwslt14_prompt / README.md
may-ohta's picture
update README
3aea04f
---
license: apache-2.0
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language:
- en
- de
- fr
- multilingual
datasets:
- may-ohta/iwslt14
metrics:
- bleu
---
# JoeyNMT: iwslt14 de-en-fr multilingual
This is a JoeyNMT model for multilingual MT with language tags, built for a demo purpose.
The model is trained on iwslt14 de-en / en-fr parallel data using DDP.
Install [JoeyNMT](https://github.com/joeynmt/joeynmt) v2.3:
```
$ pip install git+https://github.com/joeynmt/joeynmt.git
```
## Translation
Torch hub interface:
```python
import torch
iwslt14 = torch.hub.load("joeynmt/joeynmt", "iwslt14_prompt")
translation = iwslt14.translate(
src=["Hello world!"], # src sentence
src_prompt=["<en>"], # src language code
trg_prompt=["<de>"], # trg language code
beam_size=1,
)
print(translation) # ["Hallo Welt!"]
```
(See [jupyter notebook](https://github.com/joeynmt/joeynmt/blob/main/notebooks/torchhub.ipynb) for details)
## Training
```
$ python -m joeynmt train iwslt14_prompt/config.yaml --use-ddp --skip-test
```
(See `train.log` for details)
## Evaluation
```
$ git clone https://huggingface.co/may-ohta/iwslt14_prompt
$ python -m joeynmt test iwslt14_prompt/config.yaml --output-path iwslt14_prompt/hyp
```
direction | bleu
--------- | :----
en->de | 28.88
de->en | 35.28
en->fr | 38.86
fr->en | 40.35
- beam_size: 5
- beam_alpha: 1.0
- sacrebleu signature `nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0`
(See `test.log` for details)
## Data Format
We downloaded IWSLT14 de-en and en-fr from [https://wit3.fbk.eu/2014-01](https://wit3.fbk.eu/2014-01) and created `{train|dev|test}.tsv` files in the following format:
|src_prompt|src|trg_prompt|trg|
|:---------|:--|:---------|:--|
|`<en>`|Hello.|`<de>`|Hallo.|
|`<de>`|Vielen Dank!|`<en>`|Thank you!|
(See `test.ref.de-en.tsv`)