Model Card for Model NegBLEURT
This model is a negation-aware version of the BLEURT metric for evaluation of generated text.
Direct Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "tum-nlp/NegBLEURT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
references = ["Ray Charles is legendary.", "Ray Charles is legendary."]
candidates = ["Ray Charles is a legend.", "Ray Charles isn’t legendary."]
tokenized = tokenizer(references, candidates, return_tensors='pt', padding=True)
print(model(**tokenized).logits)
# returns scores 0.8409 and 0.2601 for the two candidates
Use with pipeline
from transformers import pipeline
pipe = pipeline("text-classification", model="tum-nlp/NegBLEURT", function_to_apply="none") # set function_to_apply="none" for regression output!
pairwise_input = [
[["Ray Charles is legendary.", "Ray Charles is a legend."]],
[["Ray Charles is legendary.", "Ray Charles isn’t legendary."]]
]
print(pipe(pairwise_input))
# returns [{'label': 'NegBLEURT_score', 'score': 0.8408917784690857}, {'label': 'NegBLEURT_score', 'score': 0.26007288694381714}]
Training Details
The model is a fine-tuned version of the bleurt-tiny checkpoint from the official BLUERT repository. It was fine-tuned on the CANNOT dataset's train split for 500 steps using the fine-tuning script provided by BLEURT.
Citation
Please cite our INLG 2023 paper, if you use our model. BibTeX:
@misc{anschütz2023correct,
title={This is not correct! Negation-aware Evaluation of Language Generation Systems},
author={Miriam Anschütz and Diego Miguel Lozano and Georg Groh},
year={2023},
eprint={2307.13989},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 163
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.