Spaces:

hpi-dhc
/

FairEval

Runtime error

App Files Files Community

illorca commited on Dec 12, 2022

Commit

68b945f

1 Parent(s): 2f1260e

Include error counts in Popular Paper section

Browse files

Files changed (1) hide show

README.md +85 -39

README.md CHANGED Viewed

@@ -82,53 +82,59 @@ Considering the following input annotated sentences:
 The output for different modes and error_formats is:
 ```python
 >>> faireval.compute(predictions=y_pred, references=y_true, mode='fair', error_format='count')
-{'PER': {'precision': 1.0, 'recall': 0.5, 'f1': 0.6666,
-         "trad_prec": 0.5, "trad_rec": 0.5, "trad_f1": 0.5,
-         'TP': 1, 'FP': 0, 'FN': 1, 'LE': 0, 'BE': 0, 'LBE': 0},
- 'INT': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0,
-         "trad_prec": 0.0, "trad_rec": 0.0, "trad_f1": 0.0,
-         'TP': 0, 'FP': 0, 'FN': 0, 'LE': 0, 'BE': 1, 'LBE': 1},
- 'OUT': {'precision': 0.6666, 'recall': 0.6666, 'f1': 0.6666,
-         "trad_prec": 0.5, "trad_rec": 0.5, "trad_f1": 0.5,
-         'TP': 1, 'FP': 0, 'FN': 0, 'LE': 1, 'BE': 0, 'LBE': 0},
- 'overall_precision': 0.5714,
- 'overall_recall': 0.4444444444444444,
- 'overall_f1': 0.5,
- 'trad_prec': 0.5,
- 'trad_rec': 0.5,
- 'trad_f1': 0.5,
- 'TP': 2,
- 'FP': 0,
- 'FN': 1,
- 'LE': 1,
- 'BE': 1,
- 'LBE': 1}
 ```
 ```python
 >>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='count')
-{'PER': {'precision': 0.5, 'recall': 0.5, 'f1': 0.5, 'TP': 1, 'FP': 1, 'FN': 1},
- 'INT': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'TP': 0, 'FP': 1, 'FN': 2},
- 'OUT': {'precision': 0.5, 'recall': 0.5, 'f1': 0.5, 'TP': 1, 'FP': 1, 'FN': 1},
- 'overall_precision': 0.4,
- 'overall_recall': 0.3333,
- 'overall_f1': 0.3636,
- 'TP': 2,
- 'FP': 3,
- 'FN': 4}
 ```
 ```python
 >>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='error_ratio')
-{'PER': {'precision': 0.5, 'recall': 0.5, 'f1': 0.5, 'TP': 1, 'FP': 0.1428, 'FN': 0.1428},
- 'INT': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'TP': 0, 'FP': 0.1428, 'FN': 0.2857},
- 'OUT': {'precision': 0.5, 'recall': 0.5, 'f1': 0.5, 'TP': 1, 'FP': 0.1428, 'FN': 0.1428},
- 'overall_precision': 0.4,
- 'overall_recall': 0.3333,
- 'overall_f1': 0.3636,
- 'TP': 2,
- 'FP': 0.4285,
- 'FN': 0.5714}
 ```
 #### Values from Popular Papers
@@ -143,6 +149,46 @@ A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/di
 | seqeval strict  | 0.2222  | 0.3425   | 0.0413 | 0.3598 | 0.0           | 0.0408      | 0.0     |
 | seqeval relaxed | 0.2803  | 0.4124   | 0.0412 | 0.4105 | 0.0           | 0.1985      | 0.0     |
 ## Limitations and Bias
 The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
 label inputs (odd for Beginning, even for Inside and zero for Outside).

 The output for different modes and error_formats is:
 ```python
 >>> faireval.compute(predictions=y_pred, references=y_true, mode='fair', error_format='count')
+{"PER": {"precision": 1.0,"recall": 0.5,"f1": 0.6666,
+         "trad_prec": 0.5,"trad_rec": 0.5,"trad_f1": 0.5,
+         "TP": 1,"FP": 0.0,"FN": 1.0,"LE": 0.0,"BE": 0.0,"LBE": 0.0},
+ "INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
+         "trad_prec": 0.0,"trad_rec": 0.0,"trad_f1": 0.0,
+         "TP": 0,"FP": 0.0,"FN": 0.0,"LE": 0.0,"BE": 1.0,"LBE": 1.0},
+ "OUT": {"precision": 0.6666,"recall": 0.6666,"f1": 0.666,
+         "trad_prec": 0.5,"trad_rec": 0.5,"trad_f1": 0.5,
+         "TP": 1,"FP": 0.0,"FN": 0.0,"LE": 1.0,"BE": 0.0,"LBE": 0.0},
+ "overall_precision": 0.5714,
+ "overall_recall": 0.4444,
+ "overall_f1": 0.5,
+ "overall_trad_prec": 0.4,
+ "overall_trad_rec": 0.3333,
+ "overall_trad_f1": 0.3636,
+ "TP": 2,
+ "FP": 0.0,
+ "FN": 1.0,
+ "LE": 1.0,
+ "BE": 1.0,
+ "LBE": 1.0}
 ```
 ```python
 >>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='count')
+{"PER": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
+         "TP": 1,"FP": 1.0,"FN": 1.0},
+ "INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
+         "TP": 0,"FP": 1.0,"FN": 2.0},
+ "OUT": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
+         "TP": 1,"FP": 1.0,"FN": 1.0},
+ "overall_precision": 0.4,
+ "overall_recall": 0.3333,
+ "overall_f1": 0.3636,
+ "TP": 2,
+ "FP": 3.0,
+ "FN": 4.0}
 ```
 ```python
 >>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='error_ratio')
+{"PER": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
+         "TP": 1,"FP": 0.1428,"FN": 0.1428},
+ "INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
+         "TP": 0,"FP": 0.14285714285714285,"FN": 0.2857},
+ "OUT": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
+         "TP": 1,"FP": 0.1428,"FN": 0.1428},
+ "overall_precision": 0.4,
+ "overall_recall": 0.3333,
+ "overall_f1": 0.3636,
+ "TP": 2,
+ "FP": 0.4285,
+ "FN": 0.5714}
 ```
 #### Values from Popular Papers
 | seqeval strict  | 0.2222  | 0.3425   | 0.0413 | 0.3598 | 0.0           | 0.0408      | 0.0     |
 | seqeval relaxed | 0.2803  | 0.4124   | 0.0412 | 0.4105 | 0.0           | 0.1985      | 0.0     |
+The traditional count of evaluation parameters would be:
+|    | Overall | Location | Group | Person | Creative Work | Corporation | Product |
+|----|---------|----------|-------|--------|---------------|-------------|---------|
+| TP |     211 |       53 |     4 |    140 |             0 |          14 |       0 |
+| FP |     353 |       42 |    42 |    174 |             1 |          70 |       0 |
+| FN |     730 |      144 |   144 |    228 |           116 |          43 |     114 |
+While the fair evaluation parameter count (`error_format='count'`) is:
+|     | Overall | Location | Group | Person | Creative Work | Corporation | Product |
+|-----|---------|----------|-------|--------|---------------|-------------|---------|
+| TP  | 211     | 53       | 4     | 140    | 0             | 0           | 0       |
+| FP  | 125     | 9        | 21    | 62     | 1             | 32          | 0       |
+| FN  | 544     | 59       | 115   | 153    | 95            | 34          | 88      |
+| BE  | 105     | 11       | 4     | 87     | 0             | 3           | 0       |
+| LE  | 66      | 7        | 20    | 12     | 7             | 6           | 14      |
+| LBE | 57      | 10       | 6     | 9      | 15            | 2           | 15      |
+Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
+|     | Overall | Location | Group  | Person | Creative Work | Corporation | Product |
+|-----|---------|----------|--------|--------|---------------|-------------|---------|
+| FP  |  13,94% |    1,00% |  2,34% |  6,91% |         0,11% |       3,57% |   0,00% |
+| FN  |  60,65% |    6,58% | 12,82% | 17,06% |        10,59% |       3,79% |   9,81% |
+| BE  |  11,71% |    1,23% |  0,45% |  9,70% |         0,00% |       0,33% |   0,00% |
+| LE  |   7,36% |    0,78% |  2,23% |  1,34% |         0,78% |       0,67% |   1,56% |
+| LBE |   6,35% |    1,11% |  0,67% |  1,00% |         1,67% |       0,22% |   1,67% |
+And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
+|     | Overall | Location | Group  | Person | Creative Work | Corporation | Product |
+|-----|---------|----------|--------|--------|---------------|-------------|---------|
+| TP  |  19,04% |    4,78% |  0,36% | 12,64% |         0,00% |       0,00% |   0,00% |
+| FP  |  11,28% |    0,81% |  1,90% |  5,60% |         0,09% |       2,89% |   0,00% |
+| FN  |  49,10% |    5,32% | 10,38% | 13,81% |         8,57% |       3,07% |   7,94% |
+| BE  |   9,48% |    0,99% |  0,36% |  7,85% |         0,00% |       0,27% |   0,00% |
+| LE  |   5,96% |    0,63% |  1,81% |  1,08% |         0,63% |       0,54% |   1,26% |
+| LBE |   5,14% |    0,90% |  0,54% |  0,81% |         1,35% |       0,18% |   1,35% |
 ## Limitations and Bias
 The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
 label inputs (odd for Beginning, even for Inside and zero for Outside).