Spaces:

hpi-dhc
/

FairEval

Runtime error

App Files Files Community

illorca commited on Dec 13, 2022

Commit

3b9499b

1 Parent(s): 6a56d7d

Using different model for Values from Pop papers

Browse files

Files changed (1) hide show

README.md +38 -38

README.md CHANGED Viewed

@@ -121,56 +121,56 @@ The output for different modes and error_formats is:
 ```
 #### Values from Popular Papers
-A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/distilbert) downstream-trained on the
-[WNUT-17](https://huggingface.co/datasets/wnut_17) dataset sheds the following F1 scores. Seqeval is shown for comparison.
-|                 | Overall | Location | Group  | Person | Creative Work | Corporation | Product |
-|-----------------|---------|----------|--------|--------|---------------|-------------|---------|
-| Traditional     | 0.2803  | 0.4124   | 0.0412 | 0.4105 | 0.0           | 0.1985      | 0.0     |
-| Fair            | 0.3199  | 0.5247   | 0.0459 | 0.4643 | 0.0           | 0.2666      | 0.0     |
-| Weighted        | 0.3842  | 0.5638   | 0.0681 | 0.5676 | 0.0           | 0.2910      | 0.0     |
-| seqeval strict  | 0.2222  | 0.3425   | 0.0413 | 0.3598 | 0.0           | 0.0408      | 0.0     |
-| seqeval relaxed | 0.2803  | 0.4124   | 0.0412 | 0.4105 | 0.0           | 0.1985      | 0.0     |
 The traditional count of evaluation parameters would be:
-|    | Overall | Location | Group | Person | Creative Work | Corporation | Product |
-|----|---------|----------|-------|--------|---------------|-------------|---------|
-| TP |     211 |       53 |     4 |    140 |             0 |          14 |       0 |
-| FP |     353 |       42 |    42 |    174 |             1 |          70 |       0 |
-| FN |     730 |      144 |   144 |    228 |           116 |          43 |     114 |
-While the fair evaluation parameter count (`error_format='count'`) is:
-|     | Overall | Location | Group | Person | Creative Work | Corporation | Product |
-|-----|---------|----------|-------|--------|---------------|-------------|---------|
-| TP  | 211     | 53       | 4     | 140    | 0             | 0           | 0       |
-| FP  | 125     | 9        | 21    | 62     | 1             | 32          | 0       |
-| FN  | 544     | 59       | 115   | 153    | 95            | 34          | 88      |
-| BE  | 105     | 11       | 4     | 87     | 0             | 3           | 0       |
-| LE  | 66      | 7        | 20    | 12     | 7             | 6           | 14      |
-| LBE | 57      | 10       | 6     | 9      | 15            | 2           | 15      |
 Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
-|     | Overall | Location | Group  | Person | Creative Work | Corporation | Product |
-|-----|---------|----------|--------|--------|---------------|-------------|---------|
-| FP  |  13,94% |    1,00% |  2,34% |  6,91% |         0,11% |       3,57% |   0,00% |
-| FN  |  60,65% |    6,58% | 12,82% | 17,06% |        10,59% |       3,79% |   9,81% |
-| BE  |  11,71% |    1,23% |  0,45% |  9,70% |         0,00% |       0,33% |   0,00% |
-| LE  |   7,36% |    0,78% |  2,23% |  1,34% |         0,78% |       0,67% |   1,56% |
-| LBE |   6,35% |    1,11% |  0,67% |  1,00% |         1,67% |       0,22% |   1,67% |
 And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
-|     | Overall | Location | Group  | Person | Creative Work | Corporation | Product |
-|-----|---------|----------|--------|--------|---------------|-------------|---------|
-| TP  |  19,04% |    4,78% |  0,36% | 12,64% |         0,00% |       0,00% |   0,00% |
-| FP  |  11,28% |    0,81% |  1,90% |  5,60% |         0,09% |       2,89% |   0,00% |
-| FN  |  49,10% |    5,32% | 10,38% | 13,81% |         8,57% |       3,07% |   7,94% |
-| BE  |   9,48% |    0,99% |  0,36% |  7,85% |         0,00% |       0,27% |   0,00% |
-| LE  |   5,96% |    0,63% |  1,81% |  1,08% |         0,63% |       0,54% |   1,26% |
-| LBE |   5,14% |    0,90% |  0,54% |  0,81% |         1,35% |       0,18% |   1,35% |
 ## Limitations and Bias
 The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical

 ```
 #### Values from Popular Papers
+Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
+run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
+|                 | overall | location | group  | person | creative work | corporation | product |
+|-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
+| traditional     |  0.3471 |   0.5254 | 0.0213 | 0.5489 |           0.0 |      0.0238 |     0.0 |
+| fair            |  0.3717 |   0.5826 | 0.0235 | 0.5835 |           0.0 |      0.0289 |     0.0 |
+| seqeval strict  |  0.3471 |   0.5254 | 0.0213 | 0.5489 |           0.0 |      0.0238 |     0.0 |
+| seqeval relaxed |  0.3383 |   0.4944 | 0.0203 | 0.5462 |           0.0 |      0.0238 |     0.0 |
 The traditional count of evaluation parameters would be:
+|    | overall | location | group | person | creative work | corporation | product |
+|----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
+| TP |     255 |       67 |     2 |    185 |             0 |           1 |       0 |
+| FP |     135 |       38 |    20 |     60 |             0 |          17 |       0 |
+| FN |     824 |       83 |   163 |    244 |           142 |          65 |     127 |
+While the fair evaluation parameter count is:
+|     | overall | location | group | person | creative work | corporation | product |
+|-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
+| TP  | 255     | 67       | 2     | 185    | 0             | 1           | 0       |
+| FP  | 31      | 10       | 3     | 16     | 0             | 2           | 0       |
+| FN  | 725     | 71       | 135   | 233    | 120           | 54          | 112     |
+| LE  | 47      | 4        | 18    | 2      | 6             | 7           | 10      |
+| BE  | 30      | 10       | 4     | 13     | 0             | 3           | 0       |
+| LBE | 29      | 1        | 6     | 0      | 16            | 1           | 5       |
 Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
+|     | overall | location | group  | person | creative work | corporation | product |
+|-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
+| TP  | 255     | 67       | 2      | 185    | 0             | 1           | 0       |
+| FP  | 3,60%   | 1,16%    | 0,35%  | 1,86%  | 0,00%         | 0,23%       | 0,00%   |
+| FN  | 84,11%  | 8,24%    | 15,66% | 27,03% | 13,92%        | 6,26%       | 12,99%  |
+| LE  | 5,45%   | 0,46%    | 2,09%  | 0,23%  | 0,70%         | 0,81%       | 1,16%   |
+| BE  | 3,48%   | 1,16%    | 0,46%  | 1,51%  | 0,00%         | 0,35%       | 0,00%   |
+| LBE | 3,36%   | 0,12%    | 0,70%  | 0,00%  | 1,86%         | 0,12%       | 0,58%   |
 And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
+|     | overall | location | group  | person | creative work | corporation | product |
+|-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
+| TP  | 22,83%  | 6,00%    | 0,18%  | 16,56% | 0,00%         | 0,09%       | 0,00%   |
+| FP  | 2,78%   | 0,90%    | 0,27%  | 1,43%  | 0,00%         | 0,18%       | 0,00%   |
+| FN  | 64,91%  | 6,36%    | 12,09% | 20,86% | 10,74%        | 4,83%       | 10,03%  |
+| LE  | 4,21%   | 0,36%    | 1,61%  | 0,18%  | 0,54%         | 0,63%       | 0,90%   |
+| BE  | 2,69%   | 0,90%    | 0,36%  | 1,16%  | 0,00%         | 0,27%       | 0,00%   |
+| LBE | 2,60%   | 0,09%    | 0,54%  | 0,00%  | 1,43%         | 0,09%       | 0,45%   |
 ## Limitations and Bias
 The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical