Update README.md
Browse files
README.md
CHANGED
@@ -21,11 +21,39 @@ It achieves the following results on the evaluation set:
|
|
21 |
|
22 |
## Model description
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Intended uses & limitations
|
27 |
|
28 |
-
|
|
|
29 |
|
30 |
## Training and evaluation data
|
31 |
|
|
|
21 |
|
22 |
## Model description
|
23 |
|
24 |
+
This is a modernbert model with a regression head designed to predict the Content score of a summary.
|
25 |
+
Before the finetuning step, the model was pretrained on a very large synthetic dataset.
|
26 |
+
|
27 |
+
The input should be the summary + [sep] + source.
|
28 |
+
|
29 |
+
```
|
30 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
31 |
+
|
32 |
+
model = AutoModelForSequenceClassification.from_pretrained("wesleymorris/modernbert-content", num_labels=1)
|
33 |
+
tokenizer = AutoTokenizer.from_pretrained("wesleymorris/modernbert-content")
|
34 |
+
|
35 |
+
def get_score(summary: str,
|
36 |
+
source: str):
|
37 |
+
text = summary+tokenizer.sep_token+source
|
38 |
+
inputs = tokenizer(text, return_tensors = 'pt')
|
39 |
+
return float(model(**inputs).logits[0])
|
40 |
+
```
|
41 |
+
|
42 |
+
|
43 |
+
### Corpus
|
44 |
+
It was trained on a corpus of 4,233 summaries of 101 sources compiled by Botarleanu et al. (2022).
|
45 |
+
The summaries were graded by expert raters on 6 criteria: Details, Main Point, Cohesion, Paraphrasing, Objective Language, and Language Beyond the Text.
|
46 |
+
A principle component analyis was used to reduce the dimensionality of the outcome variables to two.
|
47 |
+
|
48 |
+
Content includes Details, Main Point, Paraphrasing and Cohesion
|
49 |
+
|
50 |
+
### Contact
|
51 |
+
This model was developed by LEAR Lab at Vanderbilt University. For questions or comments about this model, please contact [email protected].
|
52 |
|
53 |
## Intended uses & limitations
|
54 |
|
55 |
+
This model can be used to predict human scores of content for a summary.
|
56 |
+
The scores are normalized such that 0 is the mean of the training data and 1 is one standard deviation from the mean.
|
57 |
|
58 |
## Training and evaluation data
|
59 |
|