rouge_raw / README.md
mdocekal's picture
Update README.md
498fac8 verified

A newer version of the Gradio SDK is available: 5.13.1

Upgrade
metadata
title: RougeRaw
emoji: πŸ€—
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
description: >-
  ROUGE RAW is language-agnostic variant of ROUGE without stemmer, stop words
  and synonymas.  This is a wrapper around the original
  http://hdl.handle.net/11234/1-2615 script.

Metric Card for RougeRaw

Metric Description

ROUGE RAW is language-agnostic variant of ROUGE without stemmer, stop words and synonymas. This is a wrapper around the original http://hdl.handle.net/11234/1-2615 script.

How to Use

import evaluate
rougeraw = evaluate.load('CZLC/rouge_raw')
predictions = ["the cat is on the mat", "hello there"]
references = ["the cat is on the mat", "hello there"]
results = rougeraw.compute(predictions=predictions, references=references)
print(results)
{'1_low_precision': 1.0, '1_low_recall': 1.0, '1_low_fmeasure': 1.0, '1_mid_precision': 1.0, '1_mid_recall': 1.0, '1_mid_fmeasure': 1.0, '1_high_precision': 1.0, '1_high_recall': 1.0, '1_high_fmeasure': 1.0, '2_low_precision': 1.0, '2_low_recall': 1.0, '2_low_fmeasure': 1.0, '2_mid_precision': 1.0, '2_mid_recall': 1.0, '2_mid_fmeasure': 1.0, '2_high_precision': 1.0, '2_high_recall': 1.0, '2_high_fmeasure': 1.0, 'L_low_precision': 1.0, 'L_low_recall': 1.0, 'L_low_fmeasure': 1.0, 'L_mid_precision': 1.0, 'L_mid_recall': 1.0, 'L_mid_fmeasure': 1.0, 'L_high_precision': 1.0, 'L_high_recall': 1.0, 'L_high_fmeasure': 1.0}

Inputs

predictions: list of predictions to evaluate. Each prediction should be a string with tokens separated by spaces. references: list of reference for each prediction. Each reference should be a string with tokens separated by space

Output Values

This metric outputs a dictionary, containing the scores.

There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.

Key format:

{1|2|L}_{low|mid|high}_{precision|recall|fmeasure}
e.g.: 1_low_precision

If aggregate is False the format is:

{1|2|L}_{precision|recall|fmeasure}
e.g.: 1_precision

Citation(s)

@inproceedings{straka-etal-2018-sumeczech,
    title = "{S}ume{C}zech: Large {C}zech News-Based Summarization Dataset",
    author = "Straka, Milan  and
      Mediankin, Nikita  and
      Kocmi, Tom  and
      {\v{Z}}abokrtsk{\'y}, Zden{\v{e}}k  and
      Hude{\v{c}}ek, Vojt{\v{e}}ch  and
      Haji{\v{c}}, Jan",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Hasida, Koiti  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios  and
      Tokunaga, Takenobu",
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://aclanthology.org/L18-1551",
}