Martin Dočekal commited on
Commit
d8248d0
·
1 Parent(s): 6aed907

description update

Browse files
Files changed (2) hide show
  1. README.md +15 -15
  2. rouge_raw.py +13 -11
README.md CHANGED
@@ -34,7 +34,7 @@ predictions = ["the cat is on the mat", "hello there"]
34
  references = ["the cat is on the mat", "hello there"]
35
  results = rougeraw.compute(predictions=predictions, references=references)
36
  print(results)
37
- {'rougeraw1_precision': 1.0, 'rougeraw1_recall': 1.0, 'rougeraw1_fmeasure': 1.0, 'rougeraw2_precision': 1.0, 'rougeraw2_recall': 1.0, 'rougeraw2_fmeasure': 1.0, 'rougerawl_precision': 1.0, 'rougerawl_recall': 1.0, 'rougerawl_fmeasure': 1.0}
38
  ```
39
 
40
 
@@ -43,22 +43,22 @@ predictions: list of predictions to evaluate. Each prediction should be a string
43
  references: list of reference for each prediction. Each reference should be a string with tokens separated by space
44
 
45
  ### Output Values
46
- - rougeraw1_precision
47
- - rougeraw1_recall
48
- - rougeraw1_fmeasure
49
- - rougeraw2_precision
50
- - rougeraw2_recall
51
- - rougeraw2_fmeasure
52
- - rougerawl_precision
53
- - rougerawl_recall
54
- - rougerawl_fmeasure
55
-
56
- Output Example(s):
57
- ```python
58
- {'rougeraw1_precision': 1.0, 'rougeraw1_recall': 1.0, 'rougeraw1_fmeasure': 1.0, 'rougeraw2_precision': 1.0, 'rougeraw2_recall': 1.0, 'rougeraw2_fmeasure': 1.0, 'rougerawl_precision': 1.0, 'rougerawl_recall': 1.0, 'rougerawl_fmeasure': 1.0}
59
  ```
60
 
61
- This metric outputs a dictionary, containing the scores.
 
 
 
 
62
 
63
  ## Citation(s)
64
  ```bibtex
 
34
  references = ["the cat is on the mat", "hello there"]
35
  results = rougeraw.compute(predictions=predictions, references=references)
36
  print(results)
37
+ {'1_precision': 1.0, '1_recall': 1.0, '1_fmeasure': 1.0, '2_precision': 1.0, '2_recall': 1.0, '2_fmeasure': 1.0, 'l_precision': 1.0, 'l_recall': 1.0, 'l_fmeasure': 1.0}
38
  ```
39
 
40
 
 
43
  references: list of reference for each prediction. Each reference should be a string with tokens separated by space
44
 
45
  ### Output Values
46
+ This metric outputs a dictionary, containing the scores.
47
+
48
+ There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.
49
+
50
+
51
+ Key format:
52
+ ```
53
+ {1|2|l}_{low|mid|high}_{precision|recall|fmeasure}
54
+ e.g.: 1_low_precision
 
 
 
 
55
  ```
56
 
57
+ If aggregate is False the format is:
58
+ ```
59
+ {1|2|l}_{precision|recall|fmeasure}
60
+ e.g.: 1_precision
61
+ ```
62
 
63
  ## Citation(s)
64
  ```bibtex
rouge_raw.py CHANGED
@@ -324,18 +324,20 @@ Args:
324
  select: (Optional) string. The name of the metric to return. One of: 'rougeraw1_precision', 'rougeraw1_recall', 'rougeraw1_fmeasure', 'rougeraw2_precision', 'rougeraw2_recall', 'rougeraw2_fmeasure', 'rougerawl_precision', 'rougerawl_recall', 'rougerawl_fmeasure'.
325
  If None, all metrics are returned as a dictionary.
326
  Returns:
327
- 1_precision
328
- 1_recall
329
- 1_fmeasure
330
- 2_precision
331
- 2_recall
332
- 2_fmeasure
333
- l_precision
334
- l_recall
335
- l_fmeasure
336
 
337
- if aggregate is True there are also low, mid and high values for each metric. Thus, e.g.:
338
- 1_low_precision
 
 
 
 
 
 
 
 
 
339
  Examples:
340
  >>> rougeraw = evaluate.load('CZLC/rouge_raw')
341
  >>> predictions = ["the cat is on the mat", "hello there"]
 
324
  select: (Optional) string. The name of the metric to return. One of: 'rougeraw1_precision', 'rougeraw1_recall', 'rougeraw1_fmeasure', 'rougeraw2_precision', 'rougeraw2_recall', 'rougeraw2_fmeasure', 'rougerawl_precision', 'rougerawl_recall', 'rougerawl_fmeasure'.
325
  If None, all metrics are returned as a dictionary.
326
  Returns:
327
+ This metric outputs a dictionary, containing the scores.
328
+ There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.
 
 
 
 
 
 
 
329
 
330
+ Key format:
331
+ ```
332
+ {1|2|l}_{low|mid|high}_{precision|recall|fmeasure}
333
+ e.g.: 1_low_precision
334
+ ```
335
+
336
+ If aggregate is False the format is:
337
+ ```
338
+ {1|2|l}_{precision|recall|fmeasure}
339
+ e.g.: 1_precision
340
+ ```
341
  Examples:
342
  >>> rougeraw = evaluate.load('CZLC/rouge_raw')
343
  >>> predictions = ["the cat is on the mat", "hello there"]