nist_mt / README.md
lvwerra's picture
lvwerra HF staff
Update Space (evaluate main: 8b9373dc)
a9dbecb

A newer version of the Gradio SDK is available: 5.9.1

Upgrade
metadata
title: NIST_MT
emoji: 🤗
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
  - machine-translation
description: >-
  DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU
  score.

Metric Card for NIST's MT metric

Metric Description

DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score. The official script used by NIST to compute BLEU and NIST score is mteval-14.pl. The main differences are:

  • BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.
  • NIST has a different brevity penalty
  • NIST score from mteval-14.pl has a self-contained tokenizer (in the Hugging Face implementation we rely on NLTK's implementation of the NIST-specific tokenizer)

Intended Uses

NIST was developed for machine translation evaluation.

How to Use

import evaluate
nist_mt = evaluate.load("nist_mt")
hypothesis1 = "It is a guide to action which ensures that the military always obeys the commands of the party"
reference1 = "It is a guide to action that ensures that the military will forever heed Party commands"
reference2 = "It is the guiding principle which guarantees the military forces always being under the command of the Party"
nist_mt.compute(hypothesis1, [reference1, reference2])
# {'nist_mt': 3.3709935957649324}

Inputs

  • predictions: tokenized predictions to score. For sentence-level NIST, a list of tokens (str); for corpus-level NIST, a list (sentences) of lists of tokens (str)
  • references: potentially multiple tokenized references for each prediction. For sentence-level NIST, a list (multiple potential references) of list of tokens (str); for corpus-level NIST, a list (corpus) of lists (multiple potential references) of lists of tokens (str)
  • n: highest n-gram order
  • tokenize_kwargs: arguments passed to the tokenizer (see: https://github.com/nltk/nltk/blob/90fa546ea600194f2799ee51eaf1b729c128711e/nltk/tokenize/nist.py#L139)

Output Values

  • nist_mt (float): NIST score

Output Example:

{'nist_mt': 3.3709935957649324}

Citation

@inproceedings{10.5555/1289189.1289273,
    author = {Doddington, George},
    title = {Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics},
    year = {2002},
    publisher = {Morgan Kaufmann Publishers Inc.},
    address = {San Francisco, CA, USA},
    booktitle = {Proceedings of the Second International Conference on Human Language Technology Research},
    pages = {138–145},
    numpages = {8},
    location = {San Diego, California},
    series = {HLT '02}
}

Further References

This Hugging Face implementation uses the NLTK implementation