File size: 6,181 Bytes
6e1c60e
8db06be
 
 
 
 
 
 
 
46b176e
 
 
 
8db06be
46b176e
 
8db06be
 
 
 
131851c
 
 
 
 
5c3818e
46b176e
 
 
 
131851c
46b176e
 
 
 
131851c
46b176e
 
 
131851c
46b176e
 
 
131851c
46b176e
 
 
131851c
46b176e
 
 
8db06be
 
131851c
46b176e
 
131851c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2e169a
5c3818e
 
 
 
 
 
 
 
 
 
 
f40ff1e
5c3818e
131851c
 
 
 
46b176e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131851c
 
 
 
 
 
 
 
46b176e
 
 
131851c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: MeaningBERT
emoji: 🦀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 4.2.0
app_file: app.py
pinned: false
---

# Here is MeaningBERT

MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was
proposed in our
article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full).
Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity
checks. For more details, refer to our publicly available article.

> This public version of our model uses the best model trained (where in our article, we present the performance results
> of an average of 10 models) for a more extended period (500 epochs instead of 250). We have observed later that the
> model can further reduce dev loss and increase performance. Also, we have changed the data augmentation technique used
> in the article for a more robust one, that also includes the commutative property of the meaning function. Namely, Meaning(Sent_a, Sent_b) = Meaning(Sent_b, Sent_a).

- [HuggingFace Model Card](https://huggingface.co/davebulaval/MeaningBERT)
- [HuggingFace Metric Card](https://huggingface.co/spaces/davebulaval/meaningbert)

## Sanity Check

Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive since it requires
a large dataset
annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
In these tests, the meaning preservation target value is not subjective and does not require human annotation to
be measured. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
compared and return a null score (i.e., 0%) if two sentences are completely unrelated.

### Identical Sentences

The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
It is calculated by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
100%.

### Unrelated Sentences

Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely
irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is
0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1].
Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use
a threshold value of 0%.

## Use MeaningBERT

You can use MeaningBERT as a [model](https://huggingface.co/davebulaval/MeaningBERT) that you can retrain or use for
inference using the following with HuggingFace

```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
model = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
```

or you can use MeaningBERT as a metric for evaluation (no retrain) using the following with HuggingFace

```python
import torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
scorer = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
scorer.eval()

documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."]
simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.",
                   "Whatever, whenever, this is a sentence."]

# We tokenize the text as a pair and return Pytorch Tensors
tokenize_text = tokenizer(documents, simplifications, truncation=True, padding=True, return_tensors="pt")

with torch.no_grad():
    # We process the text
    scores = scorer(**tokenize_text)

print(scores.logits.tolist())
```

or using our HuggingFace Metric module

```python
import evaluate

documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."]
simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.",
                   "Whatever, whenever, this is a sentence."]

meaning_bert = evaluate.load("davebulaval/meaningbert")

print(meaning_bert.compute(references=documents, predictions=simplifications))
```


------------------

## Cite

Use the following citation to cite MeaningBERT

```
@ARTICLE{10.3389/frai.2023.1223924,
AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard},    
TITLE={MeaningBERT: assessing meaning preservation between sentences},      
JOURNAL={Frontiers in Artificial Intelligence},      
VOLUME={6},           
YEAR={2023},      
URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924},       
DOI={10.3389/frai.2023.1223924},      
ISSN={2624-8212},   
}
```

------------------

## Contributing to MeaningBERT

We welcome user input, whether it regards bugs found in the library or feature propositions! Make sure to have a
look at our [contributing guidelines](https://github.com/GRAAL-Research/MeaningBERT/blob/main/.github/CONTRIBUTING.md)
for more details on this matter.

## License

MeaningBERT is MIT licensed, as found in
the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE).

------------------