|
# Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017) |
|
|
|
## Example usage |
|
|
|
First download and preprocess the data following the main [language modeling README](README.md). |
|
|
|
Then to train a convolutional LM using the `fconv_lm_dauphin_wikitext103` |
|
architecture: |
|
```bash |
|
fairseq-train --task language_modeling \ |
|
data-bin/wikitext-103 \ |
|
--save-dir checkpoints/fconv_wikitext-103 \ |
|
--arch fconv_lm_dauphin_wikitext103 \ |
|
--adaptive-softmax-cutoff 10000,20000,200000 \ |
|
--dropout 0.2 \ |
|
--criterion adaptive_loss \ |
|
--optimizer nag --clip-norm 0.1 --weight-decay 5e-06 \ |
|
--lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 \ |
|
--max-tokens 1024 --tokens-per-sample 1024 \ |
|
--ddp-backend legacy_ddp \ |
|
--max-epoch 35 |
|
``` |
|
|
|
And evaluate with: |
|
```bash |
|
fairseq-eval-lm data-bin/wikitext-103 --path checkpoints/fconv_wiki103/checkpoint_best.pt |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@inproceedings{dauphin2017language, |
|
title={Language Modeling with Gated Convolutional Networks}, |
|
author={Dauphin, Yann N and Fan, Angela and Auli, Michael and Grangier, David}, |
|
booktitle={Proceedings of the 34th International Conference on Machine Learning-Volume 70}, |
|
pages={933--941}, |
|
year={2017}, |
|
organization={JMLR} |
|
} |
|
``` |
|
|