metadata

license: mit
pipeline_tag: token-classification
tags:
  - BERT
  - bioBERT
  - NER
  - medical
metrics:
  - f1
language:
  - en

Model

NER-Model for disease/treatment entity recognition. The purpose of the model/data use is educational.

The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used:

B-D, I-D: begin and inside tags for disease
B-T, I-T: begin and inside tags for treatment
O - outside entities (irrelevant)

# Text:
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood

# Real:
Acute           -> D
obstructive     -> D
hydrocephalus   -> D
bacterial       -> D
meningitis      -> D

# Predictions:
o##bs##truct##ive     -> B-D + I-D + I-D + I-D
h##ydro##ce##pha##lus -> B-D + I-D + I-D + I-D + I-D
bacterial             -> B-D
men##ing##itis        -> B-D + I-D + I-D

Sources

This pipeline is based on the dmis-lab/biobert-base-cased-v1.2 pretrained model, fine-tuned using the relatively small BeHealthy Medical Entity dataset (1.550 training samples).

Performance

The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process.

Metric	Score
Precision	0.854523
Recall	0.859779
F1	0.857143
Accuracy	0.919590