zouharvi
/

PreCOMET-cons

Model card Files Files and versions Community

zouharvi commited on 7 days ago

Commit

fdc1f59

·

verified ·

1 Parent(s): 7abb877

Update README.md

Files changed (1) hide show

README.md +53 -4

README.md CHANGED Viewed

@@ -100,11 +100,60 @@ base_model:
 - FacebookAI/xlm-roberta-large
 ---
-# PreCOMET-cons
 This is a source-only COMET model used for efficient evaluation subset selection.
-It is not compatible with the upstream [github.com/Unbabel/COMET/](https://github.com/Unbabel/COMET/) and to run it you have to install [github.com/zouharvi/PreCOMET](https://github.com/zouharvi/PreCOMET)
-The primary use of this model is from the [subset2evaluate](https://github.com/zouharvi/subset2evaluate) package.
-Further description TODO.

 - FacebookAI/xlm-roberta-large
 ---
+# PreCOMET-cons [![Paper](https://img.shields.io/badge/📜%20paper-481.svg)](https://arxiv.org/abs/2501.18251)
 This is a source-only COMET model used for efficient evaluation subset selection.
+Specifically this model predicts `consistency` of the system ordering based on a single segment being the same as the system ordering on the whole test-set.
+The higher the scores, the better it is for evaluation because then fewer samples will be needed to arrive at the same system ordering.
+It is not compatible with the original Unbabel's COMET and to run it you have to install [github.com/zouharvi/PreCOMET](https://github.com/zouharvi/PreCOMET):
+```bash
+pip install pip3 install git+https://github.com/zouharvi/PreCOMET.git
+```
+You can then use it in Python:
+```python
+import precomet
+model = precomet.load_from_checkpoint(precomet.download_model("zouharvi/PreCOMET-cons"))
+model.predict([
+  {"src": "This is an easy source sentence."},
+  {"src": "this is a much more complicated source sen-tence that will pro·bably lead to loww scores 🤪"}
+])["scores"]
+> [0.1797918677330017, 0.32624873518943787]
+```
+The primary use of this model is from the [subset2evaluate](https://github.com/zouharvi/subset2evaluate) package:
+```python
+import subset2evaluate
+data_full = subset2evaluate.utils.load_data("wmt23/en-cs")
+data_random = subset2evaluate.select_subset.basic(data_full, method="random")
+subset2evaluate.evaluate.eval_subset_clusters(data_random[:100])
+> 1
+subset2evaluate.evaluate.eval_subset_correlation(data_random[:100], data_full)
+> 0.71
+```
+Random selection gives us only one cluster and system-level Spearman correlation of 0.71 when we have a budget for only 100 segments. However, by using this model:
+```python
+data_precomet = subset2evaluate.select_subset.basic(data_full, method="precomet_cons")
+subset2evaluate.evaluate.eval_subset_clusters(data_precomet[:100])
+> 1
+subset2evaluate.evaluate.eval_subset_correlation(data_precomet[:100], data_full)
+> 0.81
+```
+we get higher correlation.
+This work is described in [How to Select Datapoints for Efficient Human Evaluation of NLG Models?](https://arxiv.org/abs/2501.18251).
+Cite as:
+```
+@misc{zouhar2025selectdatapointsefficienthuman,
+    title={How to Select Datapoints for Efficient Human Evaluation of NLG Models?},
+    author={Vilém Zouhar and Peng Cui and Mrinmaya Sachan},
+    year={2025},
+    eprint={2501.18251},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL},
+    url={https://arxiv.org/abs/2501.18251},
+}
+```