illorca commited on
Commit
2f1260e
·
1 Parent(s): 14e865e

Add seqeval-relaxed comparison

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -135,20 +135,20 @@ The output for different modes and error_formats is:
135
  A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/distilbert) downstream-trained on the
136
  [WNUT-17](https://huggingface.co/datasets/wnut_17) dataset sheds the following F1 scores. Seqeval is shown for comparison.
137
 
138
- | | Overall | Location | Group | Person | Creative Work | Corporation | Product |
139
- |-------------|---------|----------|--------|--------|---------------|-------------|---------|
140
- | Traditional | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
141
- | Fair | 0.3199 | 0.5247 | 0.0459 | 0.4643 | 0.0 | 0.2666 | 0.0 |
142
- | Weighted | 0.3842 | 0.5638 | 0.0681 | 0.5676 | 0.0 | 0.2910 | 0.0 |
143
- | seqeval | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
 
144
 
145
  ## Limitations and Bias
146
  The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
147
  label inputs (odd for Beginning, even for Inside and zero for Outside).
148
 
149
  The choice of custom weights for wheighted evaluation is subjective to the user. Neither weighted nor fair evaluations
150
- can be compared to traditional span-based metrics used in other pairs of datasets-models. Although traditional mode should
151
- be comparable to these classical span-based metrics, there is a noticeable gap to seqeval, for instance.
152
 
153
  ## Citation
154
  Ortmann, Katrin. 2022. Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans. In *Proceedings of the Language Resources and Evaluation Conference (LREC)*, Marseille, France, pages 1400–1407. [PDF](https://aclanthology.org/2022.lrec-1.150.pdf)
 
135
  A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/distilbert) downstream-trained on the
136
  [WNUT-17](https://huggingface.co/datasets/wnut_17) dataset sheds the following F1 scores. Seqeval is shown for comparison.
137
 
138
+ | | Overall | Location | Group | Person | Creative Work | Corporation | Product |
139
+ |-----------------|---------|----------|--------|--------|---------------|-------------|---------|
140
+ | Traditional | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
141
+ | Fair | 0.3199 | 0.5247 | 0.0459 | 0.4643 | 0.0 | 0.2666 | 0.0 |
142
+ | Weighted | 0.3842 | 0.5638 | 0.0681 | 0.5676 | 0.0 | 0.2910 | 0.0 |
143
+ | seqeval strict | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
144
+ | seqeval relaxed | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
145
 
146
  ## Limitations and Bias
147
  The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
148
  label inputs (odd for Beginning, even for Inside and zero for Outside).
149
 
150
  The choice of custom weights for wheighted evaluation is subjective to the user. Neither weighted nor fair evaluations
151
+ can be compared to traditional span-based metrics used in other pairs of datasets-models.
 
152
 
153
  ## Citation
154
  Ortmann, Katrin. 2022. Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans. In *Proceedings of the Language Resources and Evaluation Conference (LREC)*, Marseille, France, pages 1400–1407. [PDF](https://aclanthology.org/2022.lrec-1.150.pdf)