Add seqeval-relaxed comparison
Browse files
README.md
CHANGED
@@ -135,20 +135,20 @@ The output for different modes and error_formats is:
|
|
135 |
A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/distilbert) downstream-trained on the
|
136 |
[WNUT-17](https://huggingface.co/datasets/wnut_17) dataset sheds the following F1 scores. Seqeval is shown for comparison.
|
137 |
|
138 |
-
|
|
139 |
-
|
140 |
-
| Traditional
|
141 |
-
| Fair
|
142 |
-
| Weighted
|
143 |
-
| seqeval
|
|
|
144 |
|
145 |
## Limitations and Bias
|
146 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
147 |
label inputs (odd for Beginning, even for Inside and zero for Outside).
|
148 |
|
149 |
The choice of custom weights for wheighted evaluation is subjective to the user. Neither weighted nor fair evaluations
|
150 |
-
can be compared to traditional span-based metrics used in other pairs of datasets-models.
|
151 |
-
be comparable to these classical span-based metrics, there is a noticeable gap to seqeval, for instance.
|
152 |
|
153 |
## Citation
|
154 |
Ortmann, Katrin. 2022. Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans. In *Proceedings of the Language Resources and Evaluation Conference (LREC)*, Marseille, France, pages 1400–1407. [PDF](https://aclanthology.org/2022.lrec-1.150.pdf)
|
|
|
135 |
A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/distilbert) downstream-trained on the
|
136 |
[WNUT-17](https://huggingface.co/datasets/wnut_17) dataset sheds the following F1 scores. Seqeval is shown for comparison.
|
137 |
|
138 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
139 |
+
|-----------------|---------|----------|--------|--------|---------------|-------------|---------|
|
140 |
+
| Traditional | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
|
141 |
+
| Fair | 0.3199 | 0.5247 | 0.0459 | 0.4643 | 0.0 | 0.2666 | 0.0 |
|
142 |
+
| Weighted | 0.3842 | 0.5638 | 0.0681 | 0.5676 | 0.0 | 0.2910 | 0.0 |
|
143 |
+
| seqeval strict | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
|
144 |
+
| seqeval relaxed | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
|
145 |
|
146 |
## Limitations and Bias
|
147 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
148 |
label inputs (odd for Beginning, even for Inside and zero for Outside).
|
149 |
|
150 |
The choice of custom weights for wheighted evaluation is subjective to the user. Neither weighted nor fair evaluations
|
151 |
+
can be compared to traditional span-based metrics used in other pairs of datasets-models.
|
|
|
152 |
|
153 |
## Citation
|
154 |
Ortmann, Katrin. 2022. Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans. In *Proceedings of the Language Resources and Evaluation Conference (LREC)*, Marseille, France, pages 1400–1407. [PDF](https://aclanthology.org/2022.lrec-1.150.pdf)
|