Include CoNLL results
Browse files
README.md
CHANGED
@@ -120,57 +120,67 @@ The output for different modes and error_formats is:
|
|
120 |
"TP": 2, "FP": 0.4285, "FN": 0.5714}
|
121 |
```
|
122 |
|
123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
|
125 |
run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
|
126 |
|
127 |
| | overall | location | group | person | creative work | corporation | product |
|
128 |
|-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
|
129 |
-
| traditional | 0
|
130 |
-
| fair | 0
|
131 |
-
| seqeval strict | 0
|
132 |
-
| seqeval relaxed | 0
|
133 |
|
134 |
-
The traditional count of
|
135 |
|
136 |
-
| | overall | location | group | person | creative work | corporation | product |
|
137 |
|----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
|
138 |
-
| TP | 255 | 67 | 2 | 185 | 0 | 1 | 0 |
|
139 |
-
| FP | 135 | 38 | 20 | 60 | 0 | 17 | 0 |
|
140 |
-
| FN | 824 | 83 | 163 | 244 | 142 | 65 | 127 |
|
141 |
|
142 |
-
While the fair
|
143 |
|
144 |
-
| | overall | location | group | person | creative work | corporation | product |
|
145 |
|-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
|
146 |
-
| TP
|
147 |
-
| FP
|
148 |
-
| FN
|
149 |
-
| LE
|
150 |
-
| BE
|
151 |
-
| LBE
|
152 |
-
|
153 |
-
Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
|
154 |
-
|
155 |
-
| | overall | location | group | person | creative work | corporation | product |
|
156 |
-
|-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
|
157 |
-
| TP | 255 | 67 | 2 | 185 | 0 | 1 | 0 |
|
158 |
-
| FP | 3,60% | 1,16% | 0,35% | 1,86% | 0,00% | 0,23% | 0,00% |
|
159 |
-
| FN | 84,11% | 8,24% | 15,66% | 27,03% | 13,92% | 6,26% | 12,99% |
|
160 |
-
| LE | 5,45% | 0,46% | 2,09% | 0,23% | 0,70% | 0,81% | 1,16% |
|
161 |
-
| BE | 3,48% | 1,16% | 0,46% | 1,51% | 0,00% | 0,35% | 0,00% |
|
162 |
-
| LBE | 3,36% | 0,12% | 0,70% | 0,00% | 1,86% | 0,12% | 0,58% |
|
163 |
-
|
164 |
-
And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
|
165 |
-
|
166 |
-
| | overall | location | group | person | creative work | corporation | product |
|
167 |
-
|-----|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
|
168 |
-
| TP | 22,83% | 6,00% | 0,18% | 16,56% | 0,00% | 0,09% | 0,00% |
|
169 |
-
| FP | 2,78% | 0,90% | 0,27% | 1,43% | 0,00% | 0,18% | 0,00% |
|
170 |
-
| FN | 64,91% | 6,36% | 12,09% | 20,86% | 10,74% | 4,83% | 10,03% |
|
171 |
-
| LE | 4,21% | 0,36% | 1,61% | 0,18% | 0,54% | 0,63% | 0,90% |
|
172 |
-
| BE | 2,69% | 0,90% | 0,36% | 1,16% | 0,00% | 0,27% | 0,00% |
|
173 |
-
| LBE | 2,60% | 0,09% | 0,54% | 0,00% | 1,43% | 0,09% | 0,45% |
|
174 |
|
175 |
## Limitations and Bias
|
176 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
|
|
120 |
"TP": 2, "FP": 0.4285, "FN": 0.5714}
|
121 |
```
|
122 |
|
123 |
+
### Values from Popular Papers
|
124 |
+
|
125 |
+
#### CoNLL2003
|
126 |
+
Computing the evaluation metrics on the results from [this model](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english)
|
127 |
+
run on the test split of [CoNLL2003 dataset](https://huggingface.co/datasets/conll2003), we obtain the following F1-Scores:
|
128 |
+
|
129 |
+
| F1 Scores | overall | location | miscelaneous | organization | person |
|
130 |
+
|-----------------|---------:|----------:|--------------:|--------------:|--------:|
|
131 |
+
| trad | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
|
132 |
+
| fair | 0,94 | 0,96 | 0,85 | 0,92 | 0,97 |
|
133 |
+
| seqeval strict | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
|
134 |
+
| seqeval relaxed | 0,89 | 0,92 | 0,78 | 0,86 | 0,96 |
|
135 |
+
|
136 |
+
The traditional error count is:
|
137 |
+
|
138 |
+
| | overall (error ratio \| entity ratio) | location | miscelaneous | organization | person |
|
139 |
+
|----|-----------------------------------------:|----------:|--------------:|--------------:|--------:|
|
140 |
+
| TP | 5104 ( - \| 90,36%) | 1545 | 561 | 1452 | 1546 |
|
141 |
+
| FP | 534 (49,53% \| 9,45%) | 128 | 154 | 208 | 44 |
|
142 |
+
| FN | 544 (50,46% \| 9,63%) | 123 | 141 | 209 | 71 |
|
143 |
+
|
144 |
+
And the fair count is:
|
145 |
+
|
146 |
+
| overall | location | miscelaneous | organization | person |
|
147 |
+
|-----------------------|----------|--------------|--------------|--------|
|
148 |
+
| 5104 ( - \| 90,36%) | 1545 | 561 | 1452 | 1546 |
|
149 |
+
| 126 (18,47% \| 2,23%) | 20 | 48 | 47 | 11 |
|
150 |
+
| 124 (18,18% \| 2,19%) | 13 | 47 | 47 | 17 |
|
151 |
+
| 219 (32,11% \| 3,87%) | 62 | 41 | 73 | 43 |
|
152 |
+
| 126 (18,47% \| 2,23%) | 16 | 46 | 53 | 11 |
|
153 |
+
| 87 (12,75% \| 1,54%) | 32 | 13 | 41 | 1 |
|
154 |
+
|
155 |
+
#### WNUT-17
|
156 |
Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
|
157 |
run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
|
158 |
|
159 |
| | overall | location | group | person | creative work | corporation | product |
|
160 |
|-----------------|---------:|----------:|--------:|--------:|---------------:|-------------:|---------:|
|
161 |
+
| traditional | 0,34 | 0,52 | 0,02 | 0,54 | 0,0 | 0,02 | 0,0 |
|
162 |
+
| fair | 0,37 | 0,58 | 0,02 | 0,58 | 0,0 | 0,02 | 0,0 |
|
163 |
+
| seqeval strict | 0,34 | 0,52 | 0,02 | 0,54 | 0,0 | 0,02 | 0,0 |
|
164 |
+
| seqeval relaxed | 0,33 | 0,49 | 0,02 | 0,54 | 0,0 | 0,02 | 0,0 |
|
165 |
|
166 |
+
The traditional count of errors would be:
|
167 |
|
168 |
+
| | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
|
169 |
|----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
|
170 |
+
| TP | 255 ( - \| 23,63%)| 67 | 2 | 185 | 0 | 1 | 0 |
|
171 |
+
| FP | 135 ( 14,07% \| 12,51%)| 38 | 20 | 60 | 0 | 17 | 0 |
|
172 |
+
| FN | 824 ( 85,92% \| 76,36%)| 83 | 163 | 244 | 142 | 65 | 127 |
|
173 |
|
174 |
+
While the fair count is:
|
175 |
|
176 |
+
| | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
|
177 |
|-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
|
178 |
+
| TP | 255 ( - \| 23,63%) | 67 | 2 | 185 | 0 | 1 | 0 |
|
179 |
+
| FP | 31 (3,6% \| 2,87%) | 10 | 3 | 16 | 0 | 2 | 0 |
|
180 |
+
| FN | 725 (84,11% \| 67,19%) | 71 | 135 | 233 | 120 | 54 | 112 |
|
181 |
+
| LE | 47 (5,45% \| 4,35%) | 4 | 18 | 2 | 6 | 7 | 10 |
|
182 |
+
| BE | 30 (3,48% \| 2,78%) | 10 | 4 | 13 | 0 | 3 | 0 |
|
183 |
+
| LBE | 29 (3,36% \| 2,68%) | 1 | 6 | 0 | 16 | 1 | 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
184 |
|
185 |
## Limitations and Bias
|
186 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|