wesleymorris commited on
Commit
48c3ac3
·
verified ·
1 Parent(s): c906102

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -2
README.md CHANGED
@@ -21,11 +21,39 @@ It achieves the following results on the evaluation set:
21
 
22
  ## Model description
23
 
24
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ## Intended uses & limitations
27
 
28
- More information needed
 
29
 
30
  ## Training and evaluation data
31
 
 
21
 
22
  ## Model description
23
 
24
+ This is a modernbert model with a regression head designed to predict the Content score of a summary.
25
+ Before the finetuning step, the model was pretrained on a very large synthetic dataset.
26
+
27
+ The input should be the summary + [sep] + source.
28
+
29
+ ```
30
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
31
+
32
+ model = AutoModelForSequenceClassification.from_pretrained("wesleymorris/modernbert-content", num_labels=1)
33
+ tokenizer = AutoTokenizer.from_pretrained("wesleymorris/modernbert-content")
34
+
35
+ def get_score(summary: str,
36
+ source: str):
37
+ text = summary+tokenizer.sep_token+source
38
+ inputs = tokenizer(text, return_tensors = 'pt')
39
+ return float(model(**inputs).logits[0])
40
+ ```
41
+
42
+
43
+ ### Corpus
44
+ It was trained on a corpus of 4,233 summaries of 101 sources compiled by Botarleanu et al. (2022).
45
+ The summaries were graded by expert raters on 6 criteria: Details, Main Point, Cohesion, Paraphrasing, Objective Language, and Language Beyond the Text.
46
+ A principle component analyis was used to reduce the dimensionality of the outcome variables to two.
47
+
48
+ Content includes Details, Main Point, Paraphrasing and Cohesion
49
+
50
+ ### Contact
51
+ This model was developed by LEAR Lab at Vanderbilt University. For questions or comments about this model, please contact [email protected].
52
 
53
  ## Intended uses & limitations
54
 
55
+ This model can be used to predict human scores of content for a summary.
56
+ The scores are normalized such that 0 is the mean of the training data and 1 is one standard deviation from the mean.
57
 
58
  ## Training and evaluation data
59