vaitekunas's picture
Initial comit
761982b
|
raw
history blame
1.56 kB
---
license: mit
pipeline_tag: token-classification
tags:
- BERT
- bioBERT
- NER
- medical
metrics:
- f1
language:
- en
---
# Model
NER-Model for disease/treatment entity recognition. The purpose of the model/data use is educational.
The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used:
* `B-D`, `I-D`: begin and inside tags for disease
* `B-T`, `I-T`: begin and inside tags for treatment
* `O` - outside entities (irrelevant)
```
# Text:
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood
# Real:
Acute -> D
obstructive -> D
hydrocephalus -> D
bacterial -> D
meningitis -> D
# Predictions:
o##bs##truct##ive -> B-D + I-D + I-D + I-D
h##ydro##ce##pha##lus -> B-D + I-D + I-D + I-D + I-D
bacterial -> B-D
men##ing##itis -> B-D + I-D + I-D
```
# Sources
This pipeline is based on the [dmis-lab/biobert-base-cased-v1.2](https://huggingface.co/dmis-lab/biobert-base-cased-v1.2) pretrained model,
fine-tuned using the relatively small [BeHealthy Medical Entity](https://www.kaggle.com/datasets/arunagirirajan/medical-entity-recognition-ner)
dataset (1.550 training samples).
# Performance
The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process.
|Metric |Score |
|---------|----------|
Precision | 0.854523 |
Recall | 0.859779 |
F1 | 0.857143 |
Accuracy | 0.919590 |