illorca commited on
Commit
931a43a
·
1 Parent(s): 3b9499b

Include CoNLL results

Browse files
Files changed (1) hide show
  1. README.md +50 -40
README.md CHANGED
@@ -120,57 +120,67 @@ The output for different modes and error_formats is:
120
  "TP": 2, "FP": 0.4285, "FN": 0.5714}
121
  ```
122
 
123
- #### Values from Popular Papers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
125
  run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
126
 
127
  | | overall | location | group | person | creative work | corporation | product |
128
  |-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
129
- | traditional | 0.3471 | 0.5254 | 0.0213 | 0.5489 | 0.0 | 0.0238 | 0.0 |
130
- | fair | 0.3717 | 0.5826 | 0.0235 | 0.5835 | 0.0 | 0.0289 | 0.0 |
131
- | seqeval strict | 0.3471 | 0.5254 | 0.0213 | 0.5489 | 0.0 | 0.0238 | 0.0 |
132
- | seqeval relaxed | 0.3383 | 0.4944 | 0.0203 | 0.5462 | 0.0 | 0.0238 | 0.0 |
133
 
134
- The traditional count of evaluation parameters would be:
135
 
136
- | | overall | location | group | person | creative work | corporation | product |
137
  |----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
138
- | TP | 255 | 67 | 2 | 185 | 0 | 1 | 0 |
139
- | FP | 135 | 38 | 20 | 60 | 0 | 17 | 0 |
140
- | FN | 824 | 83 | 163 | 244 | 142 | 65 | 127 |
141
 
142
- While the fair evaluation parameter count is:
143
 
144
- | | overall | location | group | person | creative work | corporation | product |
145
  |-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
146
- | TP | 255 | 67 | 2 | 185 | 0 | 1 | 0 |
147
- | FP | 31 | 10 | 3 | 16 | 0 | 2 | 0 |
148
- | FN | 725 | 71 | 135 | 233 | 120 | 54 | 112 |
149
- | LE | 47 | 4 | 18 | 2 | 6 | 7 | 10 |
150
- | BE | 30 | 10 | 4 | 13 | 0 | 3 | 0 |
151
- | LBE | 29 | 1 | 6 | 0 | 16 | 1 | 5 |
152
-
153
- Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
154
-
155
- | | overall | location | group | person | creative work | corporation | product |
156
- |-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
157
- | TP | 255 | 67 | 2 | 185 | 0 | 1 | 0 |
158
- | FP | 3,60% | 1,16% | 0,35% | 1,86% | 0,00% | 0,23% | 0,00% |
159
- | FN | 84,11% | 8,24% | 15,66% | 27,03% | 13,92% | 6,26% | 12,99% |
160
- | LE | 5,45% | 0,46% | 2,09% | 0,23% | 0,70% | 0,81% | 1,16% |
161
- | BE | 3,48% | 1,16% | 0,46% | 1,51% | 0,00% | 0,35% | 0,00% |
162
- | LBE | 3,36% | 0,12% | 0,70% | 0,00% | 1,86% | 0,12% | 0,58% |
163
-
164
- And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
165
-
166
- | | overall | location | group | person | creative work | corporation | product |
167
- |-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
168
- | TP | 22,83% | 6,00% | 0,18% | 16,56% | 0,00% | 0,09% | 0,00% |
169
- | FP | 2,78% | 0,90% | 0,27% | 1,43% | 0,00% | 0,18% | 0,00% |
170
- | FN | 64,91% | 6,36% | 12,09% | 20,86% | 10,74% | 4,83% | 10,03% |
171
- | LE | 4,21% | 0,36% | 1,61% | 0,18% | 0,54% | 0,63% | 0,90% |
172
- | BE | 2,69% | 0,90% | 0,36% | 1,16% | 0,00% | 0,27% | 0,00% |
173
- | LBE | 2,60% | 0,09% | 0,54% | 0,00% | 1,43% | 0,09% | 0,45% |
174
 
175
  ## Limitations and Bias
176
  The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
 
120
  "TP": 2, "FP": 0.4285, "FN": 0.5714}
121
  ```
122
 
123
+ ### Values from Popular Papers
124
+
125
+ #### CoNLL2003
126
+ Computing the evaluation metrics on the results from [this model](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english)
127
+ run on the test split of [CoNLL2003 dataset](https://huggingface.co/datasets/conll2003), we obtain the following F1-Scores:
128
+
129
+ | F1 Scores | overall | location | miscelaneous | organization | person |
130
+ |-----------------|---------:|----------:|--------------:|--------------:|--------:|
131
+ | trad | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
132
+ | fair | 0,94 | 0,96 | 0,85 | 0,92 | 0,97 |
133
+ | seqeval strict | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
134
+ | seqeval relaxed | 0,89 | 0,92 | 0,78 | 0,86 | 0,96 |
135
+
136
+ The traditional error count is:
137
+
138
+ | | overall (error ratio \| entity ratio) | location | miscelaneous | organization | person |
139
+ |----|-----------------------------------------:|----------:|--------------:|--------------:|--------:|
140
+ | TP | 5104 ( - \| 90,36%) | 1545 | 561 | 1452 | 1546 |
141
+ | FP | 534 (49,53% \| 9,45%) | 128 | 154 | 208 | 44 |
142
+ | FN | 544 (50,46% \| 9,63%) | 123 | 141 | 209 | 71 |
143
+
144
+ And the fair count is:
145
+
146
+ | overall | location | miscelaneous | organization | person |
147
+ |-----------------------|----------|--------------|--------------|--------|
148
+ | 5104 ( - \| 90,36%) | 1545 | 561 | 1452 | 1546 |
149
+ | 126 (18,47% \| 2,23%) | 20 | 48 | 47 | 11 |
150
+ | 124 (18,18% \| 2,19%) | 13 | 47 | 47 | 17 |
151
+ | 219 (32,11% \| 3,87%) | 62 | 41 | 73 | 43 |
152
+ | 126 (18,47% \| 2,23%) | 16 | 46 | 53 | 11 |
153
+ | 87 (12,75% \| 1,54%) | 32 | 13 | 41 | 1 |
154
+
155
+ #### WNUT-17
156
  Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
157
  run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
158
 
159
  | | overall | location | group | person | creative work | corporation | product |
160
  |-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
161
+ | traditional | 0,34 | 0,52 | 0,02 | 0,54 | 0,0 | 0,02 | 0,0 |
162
+ | fair | 0,37 | 0,58 | 0,02 | 0,58 | 0,0 | 0,02 | 0,0 |
163
+ | seqeval strict | 0,34 | 0,52 | 0,02 | 0,54 | 0,0 | 0,02 | 0,0 |
164
+ | seqeval relaxed | 0,33 | 0,49 | 0,02 | 0,54 | 0,0 | 0,02 | 0,0 |
165
 
166
+ The traditional count of errors would be:
167
 
168
+ | | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
169
  |----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
170
+ | TP | 255 ( - \| 23,63%)| 67 | 2 | 185 | 0 | 1 | 0 |
171
+ | FP | 135 ( 14,07% \| 12,51%)| 38 | 20 | 60 | 0 | 17 | 0 |
172
+ | FN | 824 ( 85,92% \| 76,36%)| 83 | 163 | 244 | 142 | 65 | 127 |
173
 
174
+ While the fair count is:
175
 
176
+ | | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
177
  |-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
178
+ | TP | 255 ( - \| 23,63%) | 67 | 2 | 185 | 0 | 1 | 0 |
179
+ | FP | 31 (3,6% \| 2,87%) | 10 | 3 | 16 | 0 | 2 | 0 |
180
+ | FN | 725 (84,11% \| 67,19%) | 71 | 135 | 233 | 120 | 54 | 112 |
181
+ | LE | 47 (5,45% \| 4,35%) | 4 | 18 | 2 | 6 | 7 | 10 |
182
+ | BE | 30 (3,48% \| 2,78%) | 10 | 4 | 13 | 0 | 3 | 0 |
183
+ | LBE | 29 (3,36% \| 2,68%) | 1 | 6 | 0 | 16 | 1 | 5 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
 
185
  ## Limitations and Bias
186
  The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical