Martin Dočekal
commited on
Commit
·
d8248d0
1
Parent(s):
6aed907
description update
Browse files- README.md +15 -15
- rouge_raw.py +13 -11
README.md
CHANGED
@@ -34,7 +34,7 @@ predictions = ["the cat is on the mat", "hello there"]
|
|
34 |
references = ["the cat is on the mat", "hello there"]
|
35 |
results = rougeraw.compute(predictions=predictions, references=references)
|
36 |
print(results)
|
37 |
-
{'
|
38 |
```
|
39 |
|
40 |
|
@@ -43,22 +43,22 @@ predictions: list of predictions to evaluate. Each prediction should be a string
|
|
43 |
references: list of reference for each prediction. Each reference should be a string with tokens separated by space
|
44 |
|
45 |
### Output Values
|
46 |
-
|
47 |
-
|
48 |
-
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
Output Example(s):
|
57 |
-
```python
|
58 |
-
{'rougeraw1_precision': 1.0, 'rougeraw1_recall': 1.0, 'rougeraw1_fmeasure': 1.0, 'rougeraw2_precision': 1.0, 'rougeraw2_recall': 1.0, 'rougeraw2_fmeasure': 1.0, 'rougerawl_precision': 1.0, 'rougerawl_recall': 1.0, 'rougerawl_fmeasure': 1.0}
|
59 |
```
|
60 |
|
61 |
-
|
|
|
|
|
|
|
|
|
62 |
|
63 |
## Citation(s)
|
64 |
```bibtex
|
|
|
34 |
references = ["the cat is on the mat", "hello there"]
|
35 |
results = rougeraw.compute(predictions=predictions, references=references)
|
36 |
print(results)
|
37 |
+
{'1_precision': 1.0, '1_recall': 1.0, '1_fmeasure': 1.0, '2_precision': 1.0, '2_recall': 1.0, '2_fmeasure': 1.0, 'l_precision': 1.0, 'l_recall': 1.0, 'l_fmeasure': 1.0}
|
38 |
```
|
39 |
|
40 |
|
|
|
43 |
references: list of reference for each prediction. Each reference should be a string with tokens separated by space
|
44 |
|
45 |
### Output Values
|
46 |
+
This metric outputs a dictionary, containing the scores.
|
47 |
+
|
48 |
+
There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.
|
49 |
+
|
50 |
+
|
51 |
+
Key format:
|
52 |
+
```
|
53 |
+
{1|2|l}_{low|mid|high}_{precision|recall|fmeasure}
|
54 |
+
e.g.: 1_low_precision
|
|
|
|
|
|
|
|
|
55 |
```
|
56 |
|
57 |
+
If aggregate is False the format is:
|
58 |
+
```
|
59 |
+
{1|2|l}_{precision|recall|fmeasure}
|
60 |
+
e.g.: 1_precision
|
61 |
+
```
|
62 |
|
63 |
## Citation(s)
|
64 |
```bibtex
|
rouge_raw.py
CHANGED
@@ -324,18 +324,20 @@ Args:
|
|
324 |
select: (Optional) string. The name of the metric to return. One of: 'rougeraw1_precision', 'rougeraw1_recall', 'rougeraw1_fmeasure', 'rougeraw2_precision', 'rougeraw2_recall', 'rougeraw2_fmeasure', 'rougerawl_precision', 'rougerawl_recall', 'rougerawl_fmeasure'.
|
325 |
If None, all metrics are returned as a dictionary.
|
326 |
Returns:
|
327 |
-
|
328 |
-
|
329 |
-
1_fmeasure
|
330 |
-
2_precision
|
331 |
-
2_recall
|
332 |
-
2_fmeasure
|
333 |
-
l_precision
|
334 |
-
l_recall
|
335 |
-
l_fmeasure
|
336 |
|
337 |
-
|
338 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
339 |
Examples:
|
340 |
>>> rougeraw = evaluate.load('CZLC/rouge_raw')
|
341 |
>>> predictions = ["the cat is on the mat", "hello there"]
|
|
|
324 |
select: (Optional) string. The name of the metric to return. One of: 'rougeraw1_precision', 'rougeraw1_recall', 'rougeraw1_fmeasure', 'rougeraw2_precision', 'rougeraw2_recall', 'rougeraw2_fmeasure', 'rougerawl_precision', 'rougerawl_recall', 'rougerawl_fmeasure'.
|
325 |
If None, all metrics are returned as a dictionary.
|
326 |
Returns:
|
327 |
+
This metric outputs a dictionary, containing the scores.
|
328 |
+
There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
329 |
|
330 |
+
Key format:
|
331 |
+
```
|
332 |
+
{1|2|l}_{low|mid|high}_{precision|recall|fmeasure}
|
333 |
+
e.g.: 1_low_precision
|
334 |
+
```
|
335 |
+
|
336 |
+
If aggregate is False the format is:
|
337 |
+
```
|
338 |
+
{1|2|l}_{precision|recall|fmeasure}
|
339 |
+
e.g.: 1_precision
|
340 |
+
```
|
341 |
Examples:
|
342 |
>>> rougeraw = evaluate.load('CZLC/rouge_raw')
|
343 |
>>> predictions = ["the cat is on the mat", "hello there"]
|