Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ Predicted sentences must have the same number of tokens as the references.
|
|
43 |
|
44 |
The optional arguments are:
|
45 |
- **mode** *(str)*: 'fair', 'traditional' ot 'weighted. Controls the desired output. The default value is 'fair'.
|
46 |
-
- 'traditional': equivalent to seqeval's
|
47 |
- 'fair': default fair score calculation. Fair will also show traditional scores for comparison.
|
48 |
- 'weighted': custom score calculation with the weights passed. Weighted will also show traditional scores for comparison.
|
49 |
- **weights** *(dict)*: dictionary with the weight of each error for the custom score calculation.
|
@@ -127,60 +127,44 @@ Computing the evaluation metrics on the results from [this model](https://huggin
|
|
127 |
run on the test split of [CoNLL2003 dataset](https://huggingface.co/datasets/conll2003), we obtain the following F1-Scores:
|
128 |
|
129 |
| F1 Scores | overall | location | miscelaneous | organization | person |
|
130 |
-
|
131 |
-
| traditional | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
|
132 |
| fair | 0,94 | 0,96 | 0,85 | 0,92 | 0,97 |
|
133 |
-
|
|
134 |
-
| seqeval
|
135 |
-
|
136 |
-
The traditional error count is:
|
137 |
-
|
138 |
-
| | overall (error ratio \| entity ratio) | location | miscelaneous | organization | person |
|
139 |
-
|----|-----------------------------------------:|----------:|--------------:|--------------:|--------:|
|
140 |
-
| TP | 5104 ( - \| 90,36%) | 1545 | 561 | 1452 | 1546 |
|
141 |
-
| FP | 534 (49,53% \| 9,45%) | 128 | 154 | 208 | 44 |
|
142 |
-
| FN | 544 (50,46% \| 9,63%) | 123 | 141 | 209 | 71 |
|
143 |
|
144 |
-
|
145 |
|
146 |
-
| | overall
|
147 |
-
|
148 |
-
| TP | 5104
|
149 |
-
| FP | 126
|
150 |
-
| FN | 124
|
151 |
-
| LE | 219
|
152 |
-
| BE | 126
|
153 |
-
| LBE | 87
|
154 |
|
155 |
#### WNUT-17
|
156 |
Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
|
157 |
run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
|
158 |
|
159 |
| | overall | location | group | person | creative work | corporation | product |
|
160 |
-
|
161 |
-
|
|
162 |
-
|
|
163 |
-
| seqeval strict | 0,
|
164 |
-
| seqeval relaxed | 0,
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
|
169 |
-
|
170 |
-
| TP
|
171 |
-
| FP
|
172 |
-
| FN
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
| | overall (error ratio \| entity ratio) | location | group | person | creative work | corporation | product |
|
177 |
-
|-----|---------:|----------:|-------:|--------:|---------------:|-------------:|---------:|
|
178 |
-
| TP | 255 ( - \| 23,63%) | 67 | 2 | 185 | 0 | 1 | 0 |
|
179 |
-
| FP | 31 (3,6% \| 2,87%) | 10 | 3 | 16 | 0 | 2 | 0 |
|
180 |
-
| FN | 725 (84,11% \| 67,19%) | 71 | 135 | 233 | 120 | 54 | 112 |
|
181 |
-
| LE | 47 (5,45% \| 4,35%) | 4 | 18 | 2 | 6 | 7 | 10 |
|
182 |
-
| LBE | 29 (3,36% \| 2,68%) | 1 | 6 | 0 | 16 | 1 | 5 |
|
183 |
-
| BE | 30 (3,48% \| 2,78%) | 10 | 4 | 13 | 0 | 3 | 0 |
|
184 |
|
185 |
## Limitations and Bias
|
186 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
|
|
43 |
|
44 |
The optional arguments are:
|
45 |
- **mode** *(str)*: 'fair', 'traditional' ot 'weighted. Controls the desired output. The default value is 'fair'.
|
46 |
+
- 'traditional': equivalent to seqeval's 'strict' mode. Bear in mind that the default mode for seqeval is 'relaxed', which does not match with any of faireval modes.
|
47 |
- 'fair': default fair score calculation. Fair will also show traditional scores for comparison.
|
48 |
- 'weighted': custom score calculation with the weights passed. Weighted will also show traditional scores for comparison.
|
49 |
- **weights** *(dict)*: dictionary with the weight of each error for the custom score calculation.
|
|
|
127 |
run on the test split of [CoNLL2003 dataset](https://huggingface.co/datasets/conll2003), we obtain the following F1-Scores:
|
128 |
|
129 |
| F1 Scores | overall | location | miscelaneous | organization | person |
|
130 |
+
|-----------------|--------:|---------:|-------------:|-------------:|-------:|
|
|
|
131 |
| fair | 0,94 | 0,96 | 0,85 | 0,92 | 0,97 |
|
132 |
+
| traditional | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
|
133 |
+
| seqeval strict | 0,90 | 0,92 | 0,79 | 0,87 | 0,96 |
|
134 |
+
| seqeval relaxed | 0,90 | 0,92 | 0,78 | 0,87 | 0,96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
135 |
|
136 |
+
With error count (traditional on the left and fair on the right):
|
137 |
|
138 |
+
| | overall | | location | | miscelaneous | | organization | | person | |
|
139 |
+
|-----|--------:|-----:|---------:|-----:|-------------:|----:|-------------:|-----:|-------:|-----:|
|
140 |
+
| TP | 5104 | 5104 | 1545 | 1545 | 561 | 561 | 1452 | 1452 | 1546 | 1546 |
|
141 |
+
| FP | 534 | 126 | 128 | 20 | 154 | 48 | 208 | 47 | 44 | 11 |
|
142 |
+
| FN | 544 | 124 | 123 | 13 | 141 | 47 | 209 | 47 | 71 | 17 |
|
143 |
+
| LE | | 219 | | 62 | | 41 | | 73 | | 43 |
|
144 |
+
| BE | | 126 | | 16 | | 46 | | 53 | | 11 |
|
145 |
+
| LBE | | 87 | | 32 | | 13 | | 41 | | 1 |
|
146 |
|
147 |
#### WNUT-17
|
148 |
Computing the evaluation metrics on the results from [this model](https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner)
|
149 |
run on the test split of [WNUT-17 dataset](https://huggingface.co/datasets/wnut_17), we obtain the following F1-Scores:
|
150 |
|
151 |
| | overall | location | group | person | creative work | corporation | product |
|
152 |
+
|-----------------|--------:|---------:|-------:|-------:|--------------:|------------:|--------:|
|
153 |
+
| fair | 0,37 | 0,58 | 0,02 | 0,58 | 0,0 | 0,03 | 0,0 |
|
154 |
+
| traditional | 0,35 | 0,53 | 0,02 | 0,55 | 0,0 | 0,02 | 0,0 |
|
155 |
+
| seqeval strict | 0,35 | 0,53 | 0,02 | 0,55 | 0,0 | 0,02 | 0,0 |
|
156 |
+
| seqeval relaxed | 0,34 | 0,49 | 0,02 | 0,55 | 0,0 | 0,02 | 0,0 |
|
157 |
+
|
158 |
+
With error count:
|
159 |
+
|
160 |
+
| | overall | | location | | group | | person | | creative work | | corporation | | product | |
|
161 |
+
|-----|--------:|----:|---------:|---:|------:|----:|-------:|----:|--------------:|----:|------------:|---:|--------:|----:|
|
162 |
+
| TP | 255 | 255 | 67 | 67 | 2 | 2 | 185 | 185 | 0 | 0 | 1 | 1 | 0 | 0 |
|
163 |
+
| FP | 135 | 31 | 38 | 10 | 20 | 3 | 60 | 16 | 0 | 0 | 17 | 2 | 0 | 0 |
|
164 |
+
| FN | 824 | 725 | 83 | 71 | 163 | 135 | 244 | 233 | 142 | 120 | 65 | 54 | 127 | 112 |
|
165 |
+
| LE | | 47 | | 4 | | 18 | | 2 | | 6 | | 7 | | 10 |
|
166 |
+
| BE | | 30 | | 10 | | 4 | | 13 | | 0 | | 3 | | 0 |
|
167 |
+
| LBE | | 29 | | 1 | | 6 | | 0 | | 16 | | 1 | | 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
|
169 |
## Limitations and Bias
|
170 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|