File size: 1,889 Bytes
68a88fe
 
3aea04f
 
 
 
 
 
 
 
 
 
 
 
 
 
68a88fe
3aea04f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: apache-2.0
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language:
- en
- de
- fr
- multilingual
datasets:
- may-ohta/iwslt14
metrics:
- bleu
---
# JoeyNMT: iwslt14 de-en-fr multilingual

This is a JoeyNMT model for multilingual MT with language tags, built for a demo purpose.
The model is trained on iwslt14 de-en / en-fr parallel data using DDP. 


Install [JoeyNMT](https://github.com/joeynmt/joeynmt) v2.3:
```
$ pip install git+https://github.com/joeynmt/joeynmt.git
```


## Translation

Torch hub interface:
```python
import torch

iwslt14 = torch.hub.load("joeynmt/joeynmt", "iwslt14_prompt")
translation = iwslt14.translate(
    src=["Hello world!"],  # src sentence
    src_prompt=["<en>"],   # src language code
    trg_prompt=["<de>"],   # trg language code
    beam_size=1,
)
print(translation)  # ["Hallo Welt!"]
```
(See [jupyter notebook](https://github.com/joeynmt/joeynmt/blob/main/notebooks/torchhub.ipynb) for details)


## Training
```
$ python -m joeynmt train iwslt14_prompt/config.yaml --use-ddp --skip-test
```
(See `train.log` for details)


## Evaluation
```
$ git clone https://huggingface.co/may-ohta/iwslt14_prompt
$ python -m joeynmt test iwslt14_prompt/config.yaml --output-path iwslt14_prompt/hyp
```

direction | bleu 
--------- | :----
en->de    | 28.88
de->en    | 35.28
en->fr    | 38.86
fr->en    | 40.35

- beam_size: 5
- beam_alpha: 1.0
- sacrebleu signature `nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0`

(See `test.log` for details)


## Data Format
We downloaded IWSLT14 de-en and en-fr from [https://wit3.fbk.eu/2014-01](https://wit3.fbk.eu/2014-01) and created `{train|dev|test}.tsv` files in the following format:

|src_prompt|src|trg_prompt|trg|
|:---------|:--|:---------|:--|
|`<en>`|Hello.|`<de>`|Hallo.|
|`<de>`|Vielen Dank!|`<en>`|Thank you!|

(See `test.ref.de-en.tsv`)