Spaces:
Running
Running
File size: 2,878 Bytes
b7ae728 99711f3 b7ae728 99711f3 b7ae728 99711f3 b7ae728 99711f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
title: NIST_MT
emoji: 🤗
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- metric
- machine-translation
description:
DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score.
---
# Metric Card for NIST's MT metric
## Metric Description
DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU
score. The official script used by NIST to compute BLEU and NIST score is
mteval-14.pl. The main differences are:
- BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.
- NIST has a different brevity penalty
- NIST score from mteval-14.pl has a self-contained tokenizer (in the Hugging Face implementation we rely on NLTK's
implementation of the NIST-specific tokenizer)
## Intended Uses
NIST was developed for machine translation evaluation.
## How to Use
```python
import evaluate
nist_mt = evaluate.load("nist_mt")
hypothesis1 = "It is a guide to action which ensures that the military always obeys the commands of the party"
reference1 = "It is a guide to action that ensures that the military will forever heed Party commands"
reference2 = "It is the guiding principle which guarantees the military forces always being under the command of the Party"
nist_mt.compute(hypothesis1, [reference1, reference2])
# {'nist_mt': 3.3709935957649324}
```
### Inputs
- **predictions**: tokenized predictions to score. For sentence-level NIST, a list of tokens (str);
for corpus-level NIST, a list (sentences) of lists of tokens (str)
- **references**: potentially multiple tokenized references for each prediction. For sentence-level NIST, a
list (multiple potential references) of list of tokens (str); for corpus-level NIST, a list (corpus) of lists
(multiple potential references) of lists of tokens (str)
- **n**: highest n-gram order
- **tokenize_kwargs**: arguments passed to the tokenizer (see: https://github.com/nltk/nltk/blob/90fa546ea600194f2799ee51eaf1b729c128711e/nltk/tokenize/nist.py#L139)
### Output Values
- **nist_mt** (`float`): NIST score
Output Example:
```python
{'nist_mt': 3.3709935957649324}
```
## Citation
```bibtex
@inproceedings{10.5555/1289189.1289273,
author = {Doddington, George},
title = {Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics},
year = {2002},
publisher = {Morgan Kaufmann Publishers Inc.},
address = {San Francisco, CA, USA},
booktitle = {Proceedings of the Second International Conference on Human Language Technology Research},
pages = {138–145},
numpages = {8},
location = {San Diego, California},
series = {HLT '02}
}
```
## Further References
This Hugging Face implementation uses [the NLTK implementation](https://github.com/nltk/nltk/blob/develop/nltk/translate/nist_score.py)
|