seqeval_with_fbeta / README.md
maksymdolgikh
readme fix
977ba1e

A newer version of the Gradio SDK is available: 5.9.1

Upgrade
metadata
title: seqeval_with_fbeta
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
description: >-
  seqeval is a Python framework for sequence labeling evaluation. seqeval can
  evaluate the performance of chunking tasks such as named-entity recognition,
  part-of-speech tagging, semantic role labeling and so on.

  This is well-tested by using the Perl script conlleval, which can be used for
  measuring the performance of a system that has processed the CoNLL-2000 shared
  task data.

  seqeval supports following formats: IOB1 IOB2 IOE1 IOE2 IOBES

  See the [README.md] file at https://github.com/chakki-works/seqeval for more
  information.

Metric Card for seqeval

Modified version of seqeval metric that include optional Fβ score. Please note that setting the optional parameter beta to values less or equal to 0 can lead to unexpected behaviour.

Metric description

seqeval is a Python framework for sequence labeling evaluation. seqeval can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

How to use

Seqeval produces labelling scores along with its sufficient statistics from a source against one or more references.

It takes two mandatory arguments:

predictions: a list of lists of predicted labels, i.e. estimated targets as returned by a tagger.

references: a list of lists of reference labels, i.e. the ground truth/target values.

It can also take several optional arguments:

beta: the weight beta of micro Fβ-score

suffix (boolean): True if the IOB tag is a suffix (after type) instead of a prefix (before type), False otherwise. The default value is False, i.e. the IOB tag is a prefix (before type).

scheme: the target tagging scheme, which can be one of [IOB1, IOB2, IOE1, IOE2, IOBES, BILOU]. The default value is None.

mode: whether to count correct entity labels with incorrect I/B tags as true positives or not. If you want to only count exact matches, pass mode="strict" and a specific scheme value. The default is None.

sample_weight: An array-like of shape (n_samples,) that provides weights for individual samples. The default is None.

zero_division: Which value to substitute as a metric value when encountering zero division. Should be one of [0,1,"warn"]. "warn" acts as 0, but the warning is raised.

>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = seqeval.compute(predictions=predictions, references=references)

Output values

This metric returns a dictionary with a summary of scores for overall and per type:

Overall:

accuracy: the average accuracy, on a scale between 0.0 and 1.0.

precision: the average precision, on a scale between 0.0 and 1.0.

recall: the average recall, on a scale between 0.0 and 1.0.

f1: the average F1 score, which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.

fbeta: the micro Fβ score.

Per type (e.g. MISC, PER, LOC,...):

precision: the average precision, on a scale between 0.0 and 1.0.

recall: the average recall, on a scale between 0.0 and 1.0.

f1: the average F1 score, on a scale between 0.0 and 1.0.

fbeta: the micro Fβ score.

Values from popular papers

The 1995 "Text Chunking using Transformation-Based Learning" paper reported a baseline recall of 81.9% and a precision of 78.2% using non Deep Learning-based methods.

More recently, seqeval continues being used for reporting performance on tasks such as named entity detection and information extraction.

Examples

Maximal values (full match) :

>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = seqeval.compute(predictions=predictions, references=references)
>>> print(results)
{'MISC': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1}, 'PER': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1}, 'overall': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'accuracy': 1.0}}

Minimal values (no match):

>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'B-MISC', 'I-MISC'], ['B-PER', 'I-PER', 'O']]
>>> references = [['B-MISC', 'O', 'O'], ['I-PER', '0', 'I-PER']]
>>> results = seqeval.compute(predictions=predictions, references=references)
>>> print(results)
{'MISC': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 1}, 'PER': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 2}, '_': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 1}, 'overall': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'accuracy': 0.0}}

Partial match:

>>> seqeval = evaluate.load('seqeval')
>>> predictions = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> references = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> results = seqeval.compute(predictions=predictions, references=references)
>>> print(results)
{'MISC': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 1}, 'PER': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1}, 'overall': {'precision': 0.5, 'recall': 0.5, 'f1-score': 0.5, 'accuracy': 0.8}}

Limitations and bias

seqeval supports following IOB formats (short for inside, outside, beginning) : IOB1, IOB2, IOE1, IOE2, IOBES, IOBES (only in strict mode) and BILOU (only in strict mode).

For more information about IOB formats, refer to the Wikipedia page and the description of the CoNLL-2000 shared task.

Citation

@inproceedings{ramshaw-marcus-1995-text,
    title = "Text Chunking using Transformation-Based Learning",
    author = "Ramshaw, Lance  and
      Marcus, Mitch",
    booktitle = "Third Workshop on Very Large Corpora",
    year = "1995",
    url = "https://www.aclweb.org/anthology/W95-0107",
}
@misc{seqeval,
  title={{seqeval}: A Python framework for sequence labeling evaluation},
  url={https://github.com/chakki-works/seqeval},
  note={Software available from https://github.com/chakki-works/seqeval},
  author={Hiroki Nakayama},
  year={2018},
}

Further References