Spaces:
Running
A newer version of the Gradio SDK is available:
5.12.0
title: Rquge
emoji: 🏢
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 3.34.0
app_file: app.py
pinned: false
Metric Card for RQUGE Score
Metric Description
RQUGE is an evaluation metric designed for assessing the quality of generated questions. RQUGE evaluates the quality of a candidate question without the need to compare it to a reference question. It operates by taking into account the relevant context and answer span and employs a general question-answering module followed by a span scoring mechanism to determine an acceptability score.
How to Use
RQUGE score takes three main inputs; "generated_questions" (list of generated questions), "contexts" (list of related contexts), and "answers" (list of reference answers). Additionally, "qa_model", and "sp_model" are used to provide the path to QA and span scorer modules. "device" is also an optional input.
from evaluate import load
rqugescore = load("alirezamsh/rquge")
generated_questions = ["how is the weather?"]
contexts = ["the weather is sunny"]
answers = ["sunny"]
results = rqugescore.compute(generated_questions=generated_questions, contexts=contexts, answers=answers)
print(results["mean_score"])
>>> [5.05]
Output Values
RQUGE score outputs a dictionary with the following values:
mean_score
: The average RQUGE score over the input texts, ranging from 1 to 5
instance_score
: Invidivual RQUGE score of each instance in the input, ranging from 1 to 5
Citation
@misc{mohammadshahi2022rquge,
title={RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question},
author={Alireza Mohammadshahi and Thomas Scialom and Majid Yazdani and Pouya Yanki and Angela Fan and James Henderson and Marzieh Saeidi},
year={2022},
eprint={2211.01482},
archivePrefix={arXiv},
primaryClass={cs.CL}
}