|
--- |
|
title: MeaningBERT |
|
emoji: 🦀 |
|
colorFrom: purple |
|
colorTo: indigo |
|
sdk: gradio |
|
sdk_version: 4.2.0 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# Here is MeaningBERT |
|
|
|
MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was |
|
proposed in our |
|
article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full). |
|
Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity |
|
checks. For more details, refer to our publicly available article. |
|
|
|
> This public version of our model uses the best model trained (where in our article, we present the performance results |
|
> of an average of 10 models) for a more extended period (500 epochs instead of 250). We have observed later that the |
|
> model can further reduce dev loss and increase performance. Also, we have changed the data augmentation technique used |
|
> in the article for a more robust one, that also includes the commutative property of the meaning function. Namely, Meaning(Sent_a, Sent_b) = Meaning(Sent_b, Sent_a). |
|
|
|
- [HuggingFace Model Card](https://huggingface.co/davebulaval/MeaningBERT) |
|
- [HuggingFace Metric Card](https://huggingface.co/spaces/davebulaval/meaningbert) |
|
|
|
## Sanity Check |
|
|
|
Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric. |
|
However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive since it requires |
|
a large dataset |
|
annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between |
|
identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving). |
|
In these tests, the meaning preservation target value is not subjective and does not require human annotation to |
|
be measured. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to |
|
achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are |
|
compared and return a null score (i.e., 0%) if two sentences are completely unrelated. |
|
|
|
### Identical Sentences |
|
|
|
The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass |
|
this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide |
|
It is calculated by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account |
|
for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of |
|
100%. |
|
|
|
### Unrelated Sentences |
|
|
|
Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large |
|
language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely |
|
irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is |
|
0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1]. |
|
Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use |
|
a threshold value of 0%. |
|
|
|
## Use MeaningBERT |
|
|
|
You can use MeaningBERT as a [model](https://huggingface.co/davebulaval/MeaningBERT) that you can retrain or use for |
|
inference using the following with HuggingFace |
|
|
|
```python |
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT") |
|
model = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT") |
|
``` |
|
|
|
or you can use MeaningBERT as a metric for evaluation (no retrain) using the following with HuggingFace |
|
|
|
```python |
|
import torch |
|
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT") |
|
scorer = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT") |
|
scorer.eval() |
|
|
|
documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."] |
|
simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.", |
|
"Whatever, whenever, this is a sentence."] |
|
|
|
# We tokenize the text as a pair and return Pytorch Tensors |
|
tokenize_text = tokenizer(documents, simplifications, truncation=True, padding=True, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
# We process the text |
|
scores = scorer(**tokenize_text) |
|
|
|
print(scores.logits.tolist()) |
|
``` |
|
|
|
or using our HuggingFace Metric module |
|
|
|
```python |
|
import evaluate |
|
|
|
documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."] |
|
simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.", |
|
"Whatever, whenever, this is a sentence."] |
|
|
|
meaning_bert = evaluate.load("davebulaval/meaningbert") |
|
|
|
print(meaning_bert.compute(references=documents, predictions=simplifications)) |
|
``` |
|
|
|
|
|
------------------ |
|
|
|
## Cite |
|
|
|
Use the following citation to cite MeaningBERT |
|
|
|
``` |
|
@ARTICLE{10.3389/frai.2023.1223924, |
|
AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard}, |
|
TITLE={MeaningBERT: assessing meaning preservation between sentences}, |
|
JOURNAL={Frontiers in Artificial Intelligence}, |
|
VOLUME={6}, |
|
YEAR={2023}, |
|
URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924}, |
|
DOI={10.3389/frai.2023.1223924}, |
|
ISSN={2624-8212}, |
|
} |
|
``` |
|
|
|
------------------ |
|
|
|
## Contributing to MeaningBERT |
|
|
|
We welcome user input, whether it regards bugs found in the library or feature propositions! Make sure to have a |
|
look at our [contributing guidelines](https://github.com/GRAAL-Research/MeaningBERT/blob/main/.github/CONTRIBUTING.md) |
|
for more details on this matter. |
|
|
|
## License |
|
|
|
MeaningBERT is MIT licensed, as found in |
|
the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE). |
|
|
|
------------------ |
|
|