Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.9.1
title: seqeval_with_fbeta
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
seqeval is a Python framework for sequence labeling evaluation. seqeval can
evaluate the performance of chunking tasks such as named-entity recognition,
part-of-speech tagging, semantic role labeling and so on.
This is well-tested by using the Perl script conlleval, which can be used for
measuring the performance of a system that has processed the CoNLL-2000 shared
task data.
seqeval supports following formats: IOB1 IOB2 IOE1 IOE2 IOBES
See the [README.md] file at https://github.com/chakki-works/seqeval for more
information.
Metric Card for seqeval
Modified version of seqeval metric that include optional Fβ score. Please note that setting the optional parameter beta
to values less or equal to 0 can lead to unexpected behaviour.
Metric description
seqeval is a Python framework for sequence labeling evaluation. seqeval can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.
How to use
Seqeval produces labelling scores along with its sufficient statistics from a source against one or more references.
It takes two mandatory arguments:
predictions
: a list of lists of predicted labels, i.e. estimated targets as returned by a tagger.
references
: a list of lists of reference labels, i.e. the ground truth/target values.
It can also take several optional arguments:
beta
: the weight beta of micro Fβ-score
suffix
(boolean): True
if the IOB tag is a suffix (after type) instead of a prefix (before type), False
otherwise. The default value is False
, i.e. the IOB tag is a prefix (before type).
scheme
: the target tagging scheme, which can be one of [IOB1
, IOB2
, IOE1
, IOE2
, IOBES
, BILOU
]. The default value is None
.
mode
: whether to count correct entity labels with incorrect I/B tags as true positives or not. If you want to only count exact matches, pass mode="strict"
and a specific scheme
value. The default is None
.
sample_weight
: An array-like of shape (n_samples,) that provides weights for individual samples. The default is None
.
zero_division
: Which value to substitute as a metric value when encountering zero division. Should be one of [0
,1
,"warn"
]. "warn"
acts as 0
, but the warning is raised.
>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = seqeval.compute(predictions=predictions, references=references)
Output values
This metric returns a dictionary with a summary of scores for overall and per type:
Overall:
accuracy
: the average accuracy, on a scale between 0.0 and 1.0.
precision
: the average precision, on a scale between 0.0 and 1.0.
recall
: the average recall, on a scale between 0.0 and 1.0.
f1
: the average F1 score, which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.
fbeta
: the micro Fβ score.
Per type (e.g. MISC
, PER
, LOC
,...):
precision
: the average precision, on a scale between 0.0 and 1.0.
recall
: the average recall, on a scale between 0.0 and 1.0.
f1
: the average F1 score, on a scale between 0.0 and 1.0.
fbeta
: the micro Fβ score.
Values from popular papers
The 1995 "Text Chunking using Transformation-Based Learning" paper reported a baseline recall of 81.9% and a precision of 78.2% using non Deep Learning-based methods.
More recently, seqeval continues being used for reporting performance on tasks such as named entity detection and information extraction.
Examples
Maximal values (full match) :
>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = seqeval.compute(predictions=predictions, references=references)
>>> print(results)
{'MISC': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1}, 'PER': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1}, 'overall': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'accuracy': 1.0}}
Minimal values (no match):
>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'B-MISC', 'I-MISC'], ['B-PER', 'I-PER', 'O']]
>>> references = [['B-MISC', 'O', 'O'], ['I-PER', '0', 'I-PER']]
>>> results = seqeval.compute(predictions=predictions, references=references)
>>> print(results)
{'MISC': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 1}, 'PER': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 2}, '_': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 1}, 'overall': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'accuracy': 0.0}}
Partial match:
>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = seqeval.compute(predictions=predictions, references=references)
>>> print(results)
{'MISC': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 1}, 'PER': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1}, 'overall': {'precision': 0.5, 'recall': 0.5, 'f1-score': 0.5, 'accuracy': 0.8}}
Limitations and bias
seqeval supports following IOB formats (short for inside, outside, beginning) : IOB1
, IOB2
, IOE1
, IOE2
, IOBES
, IOBES
(only in strict mode) and BILOU
(only in strict mode).
For more information about IOB formats, refer to the Wikipedia page and the description of the CoNLL-2000 shared task.
Citation
@inproceedings{ramshaw-marcus-1995-text,
title = "Text Chunking using Transformation-Based Learning",
author = "Ramshaw, Lance and
Marcus, Mitch",
booktitle = "Third Workshop on Very Large Corpora",
year = "1995",
url = "https://www.aclweb.org/anthology/W95-0107",
}
@misc{seqeval,
title={{seqeval}: A Python framework for sequence labeling evaluation},
url={https://github.com/chakki-works/seqeval},
note={Software available from https://github.com/chakki-works/seqeval},
author={Hiroki Nakayama},
year={2018},
}