Spaces:

evaluate-metric
/

mauve

Running

evaluate-bot commited on Nov 2, 2023

Commit

55bcb26

1 Parent(s): 5cc19e4

Update Space (evaluate main: 18932858)

Files changed (2) hide show

mauve.py CHANGED Viewed

@@ -27,20 +27,26 @@ import evaluate
 _CITATION = """\
 @inproceedings{pillutla-etal:mauve:neurips2021,
-  title={MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers},
   author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
   booktitle = {NeurIPS},
   year      = {2021}
 }
 """
 _DESCRIPTION = """\
-MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE measure.
-MAUVE summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.
-For details, see the MAUVE paper: https://arxiv.org/abs/2102.01454 (Neurips, 2021).
 This metrics is a wrapper around the official implementation of MAUVE:
 https://github.com/krishnap25/mauve

 _CITATION = """\
 @inproceedings{pillutla-etal:mauve:neurips2021,
+  title={{MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers}},
   author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
   booktitle = {NeurIPS},
   year      = {2021}
 }
+@article{pillutla-etal:mauve:arxiv2022,
+  title={{MAUVE Scores for Generative Models: Theory and Practice}},
+  author={Pillutla, Krishna and Liu, Lang and Thickstun, John and Welleck, Sean and Swayamdipta, Swabha and Zellers, Rowan and Oh, Sewoong and Choi, Yejin and Harchaoui, Zaid},
+  journal={arXiv Preprint},
+  year={2022}
+}
 """
 _DESCRIPTION = """\
+MAUVE is a measure of the statistical gap between two text distributions, e.g., how far the text written by a model is the distribution of human text, using samples from both distributions.
+MAUVE is obtained by computing Kullback–Leibler (KL) divergences between the two distributions in a quantized embedding space of a large language model.
+It can quantify differences in the quality of generated text based on the size of the model, the decoding algorithm, and the length of the generated text.
+MAUVE was found to correlate the strongest with human evaluations over baseline metrics for open-ended text generation.
 This metrics is a wrapper around the official implementation of MAUVE:
 https://github.com/krishnap25/mauve

requirements.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-git+https://github.com/huggingface/evaluate@344b8b45be3a5eb927bef6d897da876ba9b2f228
 faiss-cpu
 scikit-learn
 mauve-text

+git+https://github.com/huggingface/evaluate@18932858570b9fa97ac478e1e6e709438e4d093b
 faiss-cpu
 scikit-learn
 mauve-text