|
--- |
|
title: MeaningBERT |
|
emoji: 🦀 |
|
colorFrom: purple |
|
colorTo: indigo |
|
sdk: gradio |
|
sdk_version: 4.2.0 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# Here is MeaningBERT |
|
|
|
MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was |
|
proposed in our |
|
article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full). |
|
Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity |
|
checks. For more details, refer to our publicly available article. |
|
|
|
> This public version of our model uses the best model trained (where in our article, we present the performance results |
|
> of an average of 10 models) for a more extended period (1,000 epochs instead of 250). We have observed later that the |
|
> model can further reduce dev loss and increase performance. |
|
|
|
## Sanity Check |
|
|
|
Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric. |
|
However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive, since it requires |
|
a large dataset |
|
annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between |
|
identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving). |
|
In these tests, the meaning preservation target value is not subjective and does not require human annotation to |
|
measure. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to |
|
achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are |
|
compared and return a null score (i.e., 0%) if two sentences are completely unrelated. |
|
|
|
### Identical sentences |
|
|
|
The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass |
|
this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide |
|
it by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account |
|
for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of |
|
100%. |
|
|
|
### Unrelated sentences |
|
|
|
Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large |
|
language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely |
|
irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is |
|
0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1]. |
|
Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a |
|
a threshold value of 0%. |
|
|
|
## Cite |
|
|
|
Use the following citation to cite MeaningBERT |
|
|
|
``` |
|
@ARTICLE{10.3389/frai.2023.1223924, |
|
AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard}, |
|
TITLE={MeaningBERT: assessing meaning preservation between sentences}, |
|
JOURNAL={Frontiers in Artificial Intelligence}, |
|
VOLUME={6}, |
|
YEAR={2023}, |
|
URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924}, |
|
DOI={10.3389/frai.2023.1223924}, |
|
ISSN={2624-8212}, |
|
} |
|
``` |
|
|
|
## License |
|
|
|
MeaningBERT is MIT licensed, as found in |
|
the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE). |