|
# (Vectorized) Lexically constrained decoding with dynamic beam allocation |
|
|
|
This page provides instructions for how to use lexically constrained decoding in Fairseq. |
|
Fairseq implements the code described in the following papers: |
|
|
|
* [Fast Lexically Constrained Decoding With Dynamic Beam Allocation](https://www.aclweb.org/anthology/N18-1119/) (Post & Vilar, 2018) |
|
* [Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting](https://www.aclweb.org/anthology/N19-1090/) (Hu et al., 2019) |
|
|
|
## Quick start |
|
|
|
Constrained search is enabled by adding the command-line argument `--constraints` to `fairseq-interactive`. |
|
Constraints are appended to each line of input, separated by tabs. Each constraint (one or more tokens) |
|
is a separate field. |
|
|
|
The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/main/examples/wmt19/README.md), |
|
translates the sentence *Die maschinelle Übersetzung ist schwer zu kontrollieren.* with the constraints |
|
"hard" and "to influence". |
|
|
|
echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\ttoinfluence" \ |
|
| normalize.py | tok.py \ |
|
| fairseq-interactive /path/to/model \ |
|
--path /path/to/model/model1.pt \ |
|
--bpe fastbpe \ |
|
--bpe-codes /path/to/model/bpecodes \ |
|
--constraints \ |
|
-s de -t en \ |
|
--beam 10 |
|
|
|
(tok.py and normalize.py can be found in the same directory as this README; they are just shortcuts around Fairseq's WMT19 preprocessing). |
|
This will generate the following output: |
|
|
|
[snip] |
|
S-0 Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren . |
|
W-0 1.844 seconds |
|
C-0 hard |
|
C-0 influence |
|
H-0 -1.5333266258239746 Mach@@ ine trans@@ lation is hard to influence . |
|
D-0 -1.5333266258239746 Machine translation is hard to influence . |
|
P-0 -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511 |
|
|
|
By default, constraints are generated in the order supplied, with any number (zero or more) of tokens generated |
|
between constraints. If you wish for the decoder to order the constraints, then use `--constraints unordered`. |
|
Note that you may want to use a larger beam. |
|
|
|
## Implementation details |
|
|
|
The heart of the implementation is in `fairseq/search.py`, which adds a `LexicallyConstrainedBeamSearch` instance. |
|
This instance of beam search tracks the progress of each hypothesis in the beam through the set of constraints |
|
provided for each input sentence. It does this using one of two classes, both found in `fairseq/token_generation_contstraints.py`: |
|
|
|
* OrderedConstraintState: assumes the `C` input constraints will be generated in the provided order |
|
* UnorderedConstraintState: tries to apply `C` (phrasal) constraints in all `C!` orders |
|
|
|
## Differences from Sockeye |
|
|
|
There are a number of [differences from Sockeye's implementation](https://awslabs.github.io/sockeye/inference.html#lexical-constraints). |
|
|
|
* Generating constraints in the order supplied (the default option here) is not available in Sockeye. |
|
* Due to an improved beam allocation method, there is no need to prune the beam. |
|
* Again due to better allocation, beam sizes as low as 10 or even 5 are often sufficient. |
|
* [The vector extensions described in Hu et al.](https://github.com/edwardjhu/sockeye/tree/trie_constraints) (NAACL 2019) were never merged |
|
into the main Sockeye branch. |
|
|
|
## Citation |
|
|
|
The paper first describing lexical constraints for seq2seq decoding is: |
|
|
|
```bibtex |
|
@inproceedings{hokamp-liu-2017-lexically, |
|
title = "Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search", |
|
author = "Hokamp, Chris and |
|
Liu, Qun", |
|
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", |
|
month = jul, |
|
year = "2017", |
|
address = "Vancouver, Canada", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://www.aclweb.org/anthology/P17-1141", |
|
doi = "10.18653/v1/P17-1141", |
|
pages = "1535--1546", |
|
} |
|
``` |
|
|
|
The fairseq implementation uses the extensions described in |
|
|
|
```bibtex |
|
@inproceedings{post-vilar-2018-fast, |
|
title = "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation", |
|
author = "Post, Matt and |
|
Vilar, David", |
|
booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", |
|
month = jun, |
|
year = "2018", |
|
address = "New Orleans, Louisiana", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://www.aclweb.org/anthology/N18-1119", |
|
doi = "10.18653/v1/N18-1119", |
|
pages = "1314--1324", |
|
} |
|
``` |
|
|
|
and |
|
|
|
```bibtex |
|
@inproceedings{hu-etal-2019-improved, |
|
title = "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting", |
|
author = "Hu, J. Edward and |
|
Khayrallah, Huda and |
|
Culkin, Ryan and |
|
Xia, Patrick and |
|
Chen, Tongfei and |
|
Post, Matt and |
|
Van Durme, Benjamin", |
|
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", |
|
month = jun, |
|
year = "2019", |
|
address = "Minneapolis, Minnesota", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://www.aclweb.org/anthology/N19-1090", |
|
doi = "10.18653/v1/N19-1090", |
|
pages = "839--850", |
|
} |
|
``` |
|
|