davebulaval commited on
Commit
46b176e
·
1 Parent(s): c5be21b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -1
README.md CHANGED
@@ -29,4 +29,61 @@ tags:
29
  - text-simplification
30
  - meaning
31
  - assess
32
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  - text-simplification
30
  - meaning
31
  - assess
32
+ ---
33
+
34
+ # Here is MeaningBERT
35
+
36
+ MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was
37
+ proposed in our
38
+ article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full).
39
+ Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity checks. For more details, refer to our publicly available article.
40
+
41
+ ## Sanity Check
42
+
43
+ Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
44
+ However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive, since it requires
45
+ a large dataset
46
+ annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
47
+ identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
48
+ In these tests, the meaning preservation target value is not subjective and does not require human annotation to
49
+ measure. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
50
+ achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
51
+ compared and return a null score (i.e., 0%) if two sentences are completely unrelated.
52
+
53
+ ### Identical sentences
54
+
55
+ The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
56
+ this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
57
+ it by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
58
+ for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
59
+ 100%.
60
+
61
+ ### Unrelated sentences
62
+
63
+ Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
64
+ language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely
65
+ irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is 0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1].
66
+ Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a
67
+ a threshold value of 0%.
68
+
69
+ ## Cite
70
+
71
+ Use the following citation to cite MeaningBERT
72
+
73
+ ```
74
+ @ARTICLE{10.3389/frai.2023.1223924,
75
+ AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard},
76
+ TITLE={MeaningBERT: assessing meaning preservation between sentences},
77
+ JOURNAL={Frontiers in Artificial Intelligence},
78
+ VOLUME={6},
79
+ YEAR={2023},
80
+ URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924},
81
+ DOI={10.3389/frai.2023.1223924},
82
+ ISSN={2624-8212},
83
+ }
84
+ ```
85
+
86
+ ## License
87
+
88
+ MeaningBERT is MIT licensed, as found in
89
+ the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE).