Spaces:

hpi-dhc
/

FairEval

Runtime error

App Files Files Community

illorca commited on Dec 18, 2022

Commit

931a43a

1 Parent(s): 3b9499b

Include CoNLL results

Browse files

Files changed (1) hide show

README.md +50 -40

README.md CHANGED Viewed

@@ -120,57 +120,67 @@ The output for different modes and error_formats is:
  "TP": 2, "FP": 0.4285, "FN": 0.5714}
 ```
-#### Values from Popular Papers
 Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
 run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
 |                 | overall | location | group  | person | creative work | corporation | product |
 |-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
-| traditional     |  0.3471 |   0.5254 | 0.0213 | 0.5489 |           0.0 |      0.0238 |     0.0 |
-| fair            |  0.3717 |   0.5826 | 0.0235 | 0.5835 |           0.0 |      0.0289 |     0.0 |
-| seqeval strict  |  0.3471 |   0.5254 | 0.0213 | 0.5489 |           0.0 |      0.0238 |     0.0 |
-| seqeval relaxed |  0.3383 |   0.4944 | 0.0203 | 0.5462 |           0.0 |      0.0238 |     0.0 |
-The traditional count of evaluation parameters would be:
-|    | overall | location | group | person | creative work | corporation | product |
 |----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
-| TP |     255 |       67 |     2 |    185 |             0 |           1 |       0 |
-| FP |     135 |       38 |    20 |     60 |             0 |          17 |       0 |
-| FN |     824 |       83 |   163 |    244 |           142 |          65 |     127 |
-While the fair evaluation parameter count is:
-|     | overall | location | group | person | creative work | corporation | product |
 |-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
-| TP  | 255     | 67       | 2     | 185    | 0             | 1           | 0       |
-| FP  | 31      | 10       | 3     | 16     | 0             | 2           | 0       |
-| FN  | 725     | 71       | 135   | 233    | 120           | 54          | 112     |
-| LE  | 47      | 4        | 18    | 2      | 6             | 7           | 10      |
-| BE  | 30      | 10       | 4     | 13     | 0             | 3           | 0       |
-| LBE | 29      | 1        | 6     | 0      | 16            | 1           | 5       |
-Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
-|     | overall | location | group  | person | creative work | corporation | product |
-|-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
-| TP  | 255     | 67       | 2      | 185    | 0             | 1           | 0       |
-| FP  | 3,60%   | 1,16%    | 0,35%  | 1,86%  | 0,00%         | 0,23%       | 0,00%   |
-| FN  | 84,11%  | 8,24%    | 15,66% | 27,03% | 13,92%        | 6,26%       | 12,99%  |
-| LE  | 5,45%   | 0,46%    | 2,09%  | 0,23%  | 0,70%         | 0,81%       | 1,16%   |
-| BE  | 3,48%   | 1,16%    | 0,46%  | 1,51%  | 0,00%         | 0,35%       | 0,00%   |
-| LBE | 3,36%   | 0,12%    | 0,70%  | 0,00%  | 1,86%         | 0,12%       | 0,58%   |
-And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
-|     | overall | location | group  | person | creative work | corporation | product |
-|-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
-| TP  | 22,83%  | 6,00%    | 0,18%  | 16,56% | 0,00%         | 0,09%       | 0,00%   |
-| FP  | 2,78%   | 0,90%    | 0,27%  | 1,43%  | 0,00%         | 0,18%       | 0,00%   |
-| FN  | 64,91%  | 6,36%    | 12,09% | 20,86% | 10,74%        | 4,83%       | 10,03%  |
-| LE  | 4,21%   | 0,36%    | 1,61%  | 0,18%  | 0,54%         | 0,63%       | 0,90%   |
-| BE  | 2,69%   | 0,90%    | 0,36%  | 1,16%  | 0,00%         | 0,27%       | 0,00%   |
-| LBE | 2,60%   | 0,09%    | 0,54%  | 0,00%  | 1,43%         | 0,09%       | 0,45%   |
 ## Limitations and Bias
 The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical

  "TP": 2, "FP": 0.4285, "FN": 0.5714}
 ```
+### Values from Popular Papers
+#### CoNLL2003
+Computing the evaluation metrics on the results from [this model](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english)
+run on the test split of [CoNLL2003 dataset](https://huggingface.co/datasets/conll2003), we obtain the following F1-Scores:
+| F1   Scores     | overall | location | miscelaneous | organization | person |
+|-----------------|---------:|----------:|--------------:|--------------:|--------:|
+| trad            | 0,90    | 0,92     | 0,79         | 0,87         | 0,96   |
+| fair            | 0,94    | 0,96     | 0,85         | 0,92         | 0,97   |
+| seqeval strict  | 0,90     | 0,92     | 0,79         | 0,87         | 0,96   |
+| seqeval relaxed | 0,89    | 0,92     | 0,78         | 0,86         | 0,96   |
+The traditional error count is:
+|    | overall (error ratio \| entity   ratio) | location | miscelaneous | organization | person |
+|----|-----------------------------------------:|----------:|--------------:|--------------:|--------:|
+| TP | 5104 ( - \| 90,36%)                     | 1545     | 561          | 1452         | 1546   |
+| FP | 534 (49,53% \| 9,45%)                   | 128      | 154          | 208          | 44     |
+| FN | 544 (50,46% \| 9,63%)                   | 123      | 141          | 209          | 71     |
+And the fair count is:
+| overall               | location | miscelaneous | organization | person |
+|-----------------------|----------|--------------|--------------|--------|
+| 5104 ( - \| 90,36%)   | 1545     | 561          | 1452         | 1546   |
+| 126 (18,47% \| 2,23%) | 20       | 48           | 47           | 11     |
+| 124 (18,18% \| 2,19%) | 13       | 47           | 47           | 17     |
+| 219 (32,11% \| 3,87%) | 62       | 41           | 73           | 43     |
+| 126 (18,47% \| 2,23%) | 16       | 46           | 53           | 11     |
+| 87 (12,75% \| 1,54%)  | 32       | 13           | 41           | 1      |
+#### WNUT-17
 Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
 run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
 |                 | overall | location | group  | person | creative work | corporation | product |
 |-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
+| traditional     |  0,34 |   0,52 | 0,02 | 0,54 |           0,0 |      0,02 |     0,0 |
+| fair            |  0,37 |   0,58 | 0,02 | 0,58 |           0,0 |      0,02 |     0,0 |
+| seqeval strict  |  0,34 |   0,52 | 0,02 | 0,54 |           0,0 |      0,02 |     0,0 |
+| seqeval relaxed |  0,33 |   0,49 | 0,02 | 0,54 |           0,0 |      0,02 |     0,0 |
+The traditional count of errors would be:
+|    | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
 |----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
+| TP |     255 ( - \| 23,63%)|       67 |     2 |    185 |             0 |           1 |       0 |
+| FP |     135 ( 14,07% \| 12,51%)|       38 |    20 |     60 |             0 |          17 |       0 |
+| FN |     824 ( 85,92% \| 76,36%)|       83 |   163 |    244 |           142 |          65 |     127 |
+While the fair count is:
+|     | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
 |-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
+| TP           | 255 ( - \| 23,63%)                    | 67       | 2     | 185    | 0             | 1           | 0       |
+| FP           | 31 (3,6% \| 2,87%)                    | 10       | 3     | 16     | 0             | 2           | 0       |
+| FN           | 725 (84,11% \| 67,19%)                | 71       | 135   | 233    | 120           | 54          | 112     |
+| LE           | 47 (5,45% \| 4,35%)                   | 4        | 18    | 2      | 6             | 7           | 10      |
+| BE           | 30 (3,48% \| 2,78%)                   | 10       | 4     | 13     | 0             | 3           | 0       |
+| LBE          | 29 (3,36% \| 2,68%)                   | 1        | 6     | 0      | 16            | 1           | 5       |
 ## Limitations and Bias
 The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical