Spaces:
Running
Running
title: NIST_MT | |
emoji: π€ | |
colorFrom: purple | |
colorTo: red | |
sdk: gradio | |
sdk_version: 3.19.1 | |
app_file: app.py | |
pinned: false | |
tags: | |
- evaluate | |
- metric | |
- machine-translation | |
description: | |
DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score. | |
# Metric Card for NIST's MT metric | |
## Metric Description | |
DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU | |
score. The official script used by NIST to compute BLEU and NIST score is | |
mteval-14.pl. The main differences are: | |
- BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean. | |
- NIST has a different brevity penalty | |
- NIST score from mteval-14.pl has a self-contained tokenizer (in the Hugging Face implementation we rely on NLTK's | |
implementation of the NIST-specific tokenizer) | |
## Intended Uses | |
NIST was developed for machine translation evaluation. | |
## How to Use | |
```python | |
import evaluate | |
nist_mt = evaluate.load("nist_mt") | |
hypothesis1 = "It is a guide to action which ensures that the military always obeys the commands of the party" | |
reference1 = "It is a guide to action that ensures that the military will forever heed Party commands" | |
reference2 = "It is the guiding principle which guarantees the military forces always being under the command of the Party" | |
nist_mt.compute(hypothesis1, [reference1, reference2]) | |
# {'nist_mt': 3.3709935957649324} | |
``` | |
### Inputs | |
- **predictions**: tokenized predictions to score. For sentence-level NIST, a list of tokens (str); | |
for corpus-level NIST, a list (sentences) of lists of tokens (str) | |
- **references**: potentially multiple tokenized references for each prediction. For sentence-level NIST, a | |
list (multiple potential references) of list of tokens (str); for corpus-level NIST, a list (corpus) of lists | |
(multiple potential references) of lists of tokens (str) | |
- **n**: highest n-gram order | |
- **tokenize_kwargs**: arguments passed to the tokenizer (see: https://github.com/nltk/nltk/blob/90fa546ea600194f2799ee51eaf1b729c128711e/nltk/tokenize/nist.py#L139) | |
### Output Values | |
- **nist_mt** (`float`): NIST score | |
Output Example: | |
```python | |
{'nist_mt': 3.3709935957649324} | |
``` | |
## Citation | |
```bibtex | |
@inproceedings{10.5555/1289189.1289273, | |
author = {Doddington, George}, | |
title = {Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics}, | |
year = {2002}, | |
publisher = {Morgan Kaufmann Publishers Inc.}, | |
address = {San Francisco, CA, USA}, | |
booktitle = {Proceedings of the Second International Conference on Human Language Technology Research}, | |
pages = {138β145}, | |
numpages = {8}, | |
location = {San Diego, California}, | |
series = {HLT '02} | |
} | |
``` | |
## Further References | |
This Hugging Face implementation uses [the NLTK implementation](https://github.com/nltk/nltk/blob/develop/nltk/translate/nist_score.py) | |