evaluate-bot commited on
Commit
55bcb26
1 Parent(s): 5cc19e4

Update Space (evaluate main: 18932858)

Browse files
Files changed (2) hide show
  1. mauve.py +11 -5
  2. requirements.txt +1 -1
mauve.py CHANGED
@@ -27,20 +27,26 @@ import evaluate
27
 
28
  _CITATION = """\
29
  @inproceedings{pillutla-etal:mauve:neurips2021,
30
- title={MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers},
31
  author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
32
  booktitle = {NeurIPS},
33
  year = {2021}
34
  }
35
 
 
 
 
 
 
 
36
  """
37
 
38
  _DESCRIPTION = """\
39
- MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE measure.
40
-
41
- MAUVE summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.
42
 
43
- For details, see the MAUVE paper: https://arxiv.org/abs/2102.01454 (Neurips, 2021).
 
 
44
 
45
  This metrics is a wrapper around the official implementation of MAUVE:
46
  https://github.com/krishnap25/mauve
 
27
 
28
  _CITATION = """\
29
  @inproceedings{pillutla-etal:mauve:neurips2021,
30
+ title={{MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers}},
31
  author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
32
  booktitle = {NeurIPS},
33
  year = {2021}
34
  }
35
 
36
+ @article{pillutla-etal:mauve:arxiv2022,
37
+ title={{MAUVE Scores for Generative Models: Theory and Practice}},
38
+ author={Pillutla, Krishna and Liu, Lang and Thickstun, John and Welleck, Sean and Swayamdipta, Swabha and Zellers, Rowan and Oh, Sewoong and Choi, Yejin and Harchaoui, Zaid},
39
+ journal={arXiv Preprint},
40
+ year={2022}
41
+ }
42
  """
43
 
44
  _DESCRIPTION = """\
45
+ MAUVE is a measure of the statistical gap between two text distributions, e.g., how far the text written by a model is the distribution of human text, using samples from both distributions.
 
 
46
 
47
+ MAUVE is obtained by computing Kullback–Leibler (KL) divergences between the two distributions in a quantized embedding space of a large language model.
48
+ It can quantify differences in the quality of generated text based on the size of the model, the decoding algorithm, and the length of the generated text.
49
+ MAUVE was found to correlate the strongest with human evaluations over baseline metrics for open-ended text generation.
50
 
51
  This metrics is a wrapper around the official implementation of MAUVE:
52
  https://github.com/krishnap25/mauve
requirements.txt CHANGED
@@ -1,4 +1,4 @@
1
- git+https://github.com/huggingface/evaluate@344b8b45be3a5eb927bef6d897da876ba9b2f228
2
  faiss-cpu
3
  scikit-learn
4
  mauve-text
 
1
+ git+https://github.com/huggingface/evaluate@18932858570b9fa97ac478e1e6e709438e4d093b
2
  faiss-cpu
3
  scikit-learn
4
  mauve-text