|
# Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019) |
|
|
|
This page includes instructions for reproducing results from the paper [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](https://arxiv.org/abs/1902.07816). |
|
|
|
## Download data |
|
|
|
First, follow the [instructions to download and preprocess the WMT'17 En-De dataset](../translation#prepare-wmt14en2desh). |
|
Make sure to learn a joint vocabulary by passing the `--joined-dictionary` option to `fairseq-preprocess`. |
|
|
|
## Train a model |
|
|
|
Then we can train a mixture of experts model using the `translation_moe` task. |
|
Use the `--method` flag to choose the MoE variant; we support hard mixtures with a learned or uniform prior (`--method hMoElp` and `hMoEup`, respectively) and soft mixures (`--method sMoElp` and `sMoEup`). |
|
The model is trained with online responsibility assignment and shared parameterization. |
|
|
|
The following command will train a `hMoElp` model with `3` experts: |
|
```bash |
|
fairseq-train --ddp-backend='legacy_ddp' \ |
|
data-bin/wmt17_en_de \ |
|
--max-update 100000 \ |
|
--task translation_moe --user-dir examples/translation_moe/translation_moe_src \ |
|
--method hMoElp --mean-pool-gating-network \ |
|
--num-experts 3 \ |
|
--arch transformer_wmt_en_de --share-all-embeddings \ |
|
--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \ |
|
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \ |
|
--lr 0.0007 \ |
|
--dropout 0.1 --weight-decay 0.0 --criterion cross_entropy \ |
|
--max-tokens 3584 |
|
``` |
|
|
|
## Translate |
|
|
|
Once a model is trained, we can generate translations from different experts using the `--gen-expert` option. |
|
For example, to generate from expert 0: |
|
```bash |
|
fairseq-generate data-bin/wmt17_en_de \ |
|
--path checkpoints/checkpoint_best.pt \ |
|
--beam 1 --remove-bpe \ |
|
--task translation_moe --user-dir examples/translation_moe/translation_moe_src \ |
|
--method hMoElp --mean-pool-gating-network \ |
|
--num-experts 3 \ |
|
--gen-expert 0 |
|
``` |
|
|
|
## Evaluate |
|
|
|
First download a tokenized version of the WMT'14 En-De test set with multiple references: |
|
```bash |
|
wget dl.fbaipublicfiles.com/fairseq/data/wmt14-en-de.extra_refs.tok |
|
``` |
|
|
|
Next apply BPE on the fly and run generation for each expert: |
|
```bash |
|
BPE_CODE=examples/translation/wmt17_en_de/code |
|
for EXPERT in $(seq 0 2); do \ |
|
cat wmt14-en-de.extra_refs.tok \ |
|
| grep ^S | cut -f 2 \ |
|
| fairseq-interactive data-bin/wmt17_en_de \ |
|
--path checkpoints/checkpoint_best.pt \ |
|
--beam 1 \ |
|
--bpe subword_nmt --bpe-codes $BPE_CODE \ |
|
--buffer-size 500 --max-tokens 6000 \ |
|
--task translation_moe --user-dir examples/translation_moe/translation_moe_src \ |
|
--method hMoElp --mean-pool-gating-network \ |
|
--num-experts 3 \ |
|
--gen-expert $EXPERT ; \ |
|
done > wmt14-en-de.extra_refs.tok.gen.3experts |
|
``` |
|
|
|
Finally use `score_moe.py` to compute pairwise BLUE and average oracle BLEU: |
|
```bash |
|
python examples/translation_moe/score.py --sys wmt14-en-de.extra_refs.tok.gen.3experts --ref wmt14-en-de.extra_refs.tok |
|
# pairwise BLEU: 48.26 |
|
# #refs covered: 2.11 |
|
# multi-reference BLEU (leave-one-out): 59.46 |
|
``` |
|
This matches row 3 from Table 7 in the paper. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{shen2019mixture, |
|
title = {Mixture Models for Diverse Machine Translation: Tricks of the Trade}, |
|
author = {Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato}, |
|
journal = {International Conference on Machine Learning}, |
|
year = 2019, |
|
} |
|
``` |
|
|