Model

NER-Model for disease/treatment/technology entity recognition. The purpose of the model/data use is educational.

The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used:

  • B-DISEASE, I-DISEASE: begin and inside tags for disease
  • B-TREATMENT, I-TREATMENT: begin and inside tags for treatment
  • B-TECHNOLOGY, I-TECHNOLOGY: begin and inside tags for technology
  • O - outside entities (irrelevant)
# Text:
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood

# Real:
Acute           -> DISEASE
obstructive     -> DISEASE
hydrocephalus   -> DISEASE
bacterial       -> DISEASE
meningitis      -> DISEASE

# Predictions:
o##bs##truct##ive     -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE
h##ydro##ce##pha##lus -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE
bacterial             -> B-DISEASE
men##ing##itis        -> B-DISEASE + I-DISEASE + I-DISEASE

Sources

This pipeline is based on the dmis-lab/biobert-base-cased-v1.2 pretrained model, fine-tuned using the relatively small BeHealthy Medical Entity dataset (1.550 training samples). The initial version of this model was then used to augment the medical technology dataset. Both datasets were then used to train this model.

Performance

The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process.

Metric Score
Precision 0.836892
Recall 0.766610
F1 0.800211
Accuracy 0.935253
Downloads last month
12
Safetensors
Model size
108M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.