metadata
license: mit
pipeline_tag: token-classification
tags:
- BERT
- bioBERT
- NER
- medical
metrics:
- f1
language:
- en
Model
NER-Model for disease/treatment entity recognition. The purpose of the model/data use is educational.
The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used:
B-D
,I-D
: begin and inside tags for diseaseB-T
,I-T
: begin and inside tags for treatmentO
- outside entities (irrelevant)
# Text:
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood
# Real:
Acute -> D
obstructive -> D
hydrocephalus -> D
bacterial -> D
meningitis -> D
# Predictions:
o##bs##truct##ive -> B-D + I-D + I-D + I-D
h##ydro##ce##pha##lus -> B-D + I-D + I-D + I-D + I-D
bacterial -> B-D
men##ing##itis -> B-D + I-D + I-D
Sources
This pipeline is based on the dmis-lab/biobert-base-cased-v1.2 pretrained model, fine-tuned using the relatively small BeHealthy Medical Entity dataset (1.550 training samples).
Performance
The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process.
Metric | Score |
---|---|
Precision | 0.854523 |
Recall | 0.859779 |
F1 | 0.857143 |
Accuracy | 0.919590 |