distinct
Browse files- README.md +10 -0
- distinct.py +0 -10
README.md
CHANGED
@@ -17,6 +17,16 @@ pinned: false
|
|
17 |
***Module Card Instructions:***
|
18 |
|
19 |
## Measurement Description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
This metric is used to calculate the diversity of a group of sentences. It can be used to either evaluate the diversity of generated responses of the testset (i.e., corpus-level diversity), or calculate diversity of a group of sampled responses given one context (i.e., utterence-level diversity). The [original paper](https://aclanthology.org/N16-1014) (Li et al. 2022) used it as corpus-level while some may use it as utterance-level. However, we don't recommend to calculate Distinct on a small group as it is sensitive to the sentence length and number.
|
21 |
|
22 |
## How to Use
|
|
|
17 |
***Module Card Instructions:***
|
18 |
|
19 |
## Measurement Description
|
20 |
+
Distinct metric is to calculate the diversity of language. We provide two versions of distinct score. Expectation-Adjusted-Distinct (EAD) is the default one, which removes the biases of the original distinct score on lengthier sentences (see Figure below). Distinct is the original version.
|
21 |
+
|
22 |
+
<p align="center">
|
23 |
+
<img src="https://huggingface.co/spaces/lsy641/distinct/resolve/main/distinct_compare_pic.jpg" alt="drawing" width="350" style="float: center;"/>
|
24 |
+
</p>
|
25 |
+
|
26 |
+
For the use of Expectation-Adjusted-Distinct, vocab_size is required.
|
27 |
+
|
28 |
+
Please follow ACL paper https://aclanthology.org/2022.acl-short.86 for motivation and follow the rules of thumb provided by https://github.com/lsy641/Expectation-Adjusted-Distinct/blob/main/EAD.ipynb to determine the vocab_size.
|
29 |
+
|
30 |
This metric is used to calculate the diversity of a group of sentences. It can be used to either evaluate the diversity of generated responses of the testset (i.e., corpus-level diversity), or calculate diversity of a group of sampled responses given one context (i.e., utterence-level diversity). The [original paper](https://aclanthology.org/N16-1014) (Li et al. 2022) used it as corpus-level while some may use it as utterance-level. However, we don't recommend to calculate Distinct on a small group as it is sensitive to the sentence length and number.
|
31 |
|
32 |
## How to Use
|
distinct.py
CHANGED
@@ -54,16 +54,6 @@ _DESCRIPTION = """\
|
|
54 |
Distinct metric is to calculate corpus-level diversity of language. We provide two versions of distinct score. Expectation-Adjusted-Distinct (EAD) is the default one, which removes
|
55 |
the biases of the original distinct score on lengthier sentences (see Figure below). Distinct is the original version.
|
56 |
|
57 |
-
|
58 |
-
For the use of Expectation-Adjusted-Distinct, vocab_size is required.
|
59 |
-
|
60 |
-
Please follow ACL paper https://aclanthology.org/2022.acl-short.86 for motivation and follow the rules of thumb provided by https://github.com/lsy641/Expectation-Adjusted-Distinct/blob/main/EAD.ipynb to determine the vocab_size
|
61 |
-
|
62 |
-
<p align="center">
|
63 |
-
<img src="https://huggingface.co/spaces/lsy641/distinct/resolve/main/distinct_compare_pic.jpg" alt="drawing" width="350" style="float: center;"/>
|
64 |
-
</p>
|
65 |
-
|
66 |
-
|
67 |
"""
|
68 |
|
69 |
|
|
|
54 |
Distinct metric is to calculate corpus-level diversity of language. We provide two versions of distinct score. Expectation-Adjusted-Distinct (EAD) is the default one, which removes
|
55 |
the biases of the original distinct score on lengthier sentences (see Figure below). Distinct is the original version.
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
"""
|
58 |
|
59 |
|