Spaces:
Runtime error
Runtime error
evaluate-bot
commited on
Commit
•
55bcb26
1
Parent(s):
5cc19e4
Update Space (evaluate main: 18932858)
Browse files- mauve.py +11 -5
- requirements.txt +1 -1
mauve.py
CHANGED
@@ -27,20 +27,26 @@ import evaluate
|
|
27 |
|
28 |
_CITATION = """\
|
29 |
@inproceedings{pillutla-etal:mauve:neurips2021,
|
30 |
-
title={MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers},
|
31 |
author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
|
32 |
booktitle = {NeurIPS},
|
33 |
year = {2021}
|
34 |
}
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
"""
|
37 |
|
38 |
_DESCRIPTION = """\
|
39 |
-
MAUVE is a
|
40 |
-
|
41 |
-
MAUVE summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.
|
42 |
|
43 |
-
|
|
|
|
|
44 |
|
45 |
This metrics is a wrapper around the official implementation of MAUVE:
|
46 |
https://github.com/krishnap25/mauve
|
|
|
27 |
|
28 |
_CITATION = """\
|
29 |
@inproceedings{pillutla-etal:mauve:neurips2021,
|
30 |
+
title={{MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers}},
|
31 |
author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
|
32 |
booktitle = {NeurIPS},
|
33 |
year = {2021}
|
34 |
}
|
35 |
|
36 |
+
@article{pillutla-etal:mauve:arxiv2022,
|
37 |
+
title={{MAUVE Scores for Generative Models: Theory and Practice}},
|
38 |
+
author={Pillutla, Krishna and Liu, Lang and Thickstun, John and Welleck, Sean and Swayamdipta, Swabha and Zellers, Rowan and Oh, Sewoong and Choi, Yejin and Harchaoui, Zaid},
|
39 |
+
journal={arXiv Preprint},
|
40 |
+
year={2022}
|
41 |
+
}
|
42 |
"""
|
43 |
|
44 |
_DESCRIPTION = """\
|
45 |
+
MAUVE is a measure of the statistical gap between two text distributions, e.g., how far the text written by a model is the distribution of human text, using samples from both distributions.
|
|
|
|
|
46 |
|
47 |
+
MAUVE is obtained by computing Kullback–Leibler (KL) divergences between the two distributions in a quantized embedding space of a large language model.
|
48 |
+
It can quantify differences in the quality of generated text based on the size of the model, the decoding algorithm, and the length of the generated text.
|
49 |
+
MAUVE was found to correlate the strongest with human evaluations over baseline metrics for open-ended text generation.
|
50 |
|
51 |
This metrics is a wrapper around the official implementation of MAUVE:
|
52 |
https://github.com/krishnap25/mauve
|
requirements.txt
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
git+https://github.com/huggingface/evaluate@
|
2 |
faiss-cpu
|
3 |
scikit-learn
|
4 |
mauve-text
|
|
|
1 |
+
git+https://github.com/huggingface/evaluate@18932858570b9fa97ac478e1e6e709438e4d093b
|
2 |
faiss-cpu
|
3 |
scikit-learn
|
4 |
mauve-text
|