license: mit
inference:
parameters:
aggregation_strategy: average
language:
- pt
pipeline_tag: token-classification
tags:
- medialbertina-ptpt
- deberta
- portuguese
- european portuguese
- medical
- clinical
- healthcare
- NER
- Named Entity Recognition
- IE
- Information Extraction
widget:
- text: >-
Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os
sinais vitais do utente, incluindo a pressão arterial, com leitura de
120/87 mmHg e a frequência cardíaca, de 80 batimentos por minuto, foram
monitorizados. Após a cirurgia o utente apresentava dor intensa no local
e inchaço no tornozelo, mas os resultados da radiografia revelaram uma
recuperação satisfatória. Foi prescrito ibuprofeno 600mg de 8 em 8 horas
durante 3 dias.
example_title: Example 1
- text: >-
Durante o procedimento endoscópico, foram encontrados pólipos no cólon do
paciente.
example_title: Example 2
- text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
example_title: Example 3
- text: >-
Após as sessões de fisioterapia o paciente apresenta recuperação de
mobilidade.
example_title: Example 4
- text: >-
O paciente está em Quimioterapia com uma dosagem específica de Cisplatina
para o tratamento do cancro do pulmão.
example_title: Example 5
- text: Monitorização da Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
example_title: Example 6
- text: >-
A ressonância magnética da utente revelou uma rotura no menisco lateral do
joelho.
example_title: Example 7
- text: >-
A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com
imunomoduladores.
example_title: Example 8
MediAlbertina
The first publicly available medical language model trained with real European Portuguese data.
MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of PORTULAN's Albertina models with Electronic Medical Records shared by Portugal's largest public hospital.
Like its antecessors, MediAlbertina models are distributed under the MIT license.
Model Description
MediAlbertina PT-PT 1.5 NER was created through fine-tuning of MediAlbertina PT-PT 1.5B on real European Portuguese EMRs that have been hand-annotated for the following entities:
- Diagnostico (D): All types of diseases and conditions following the ICD-10-CM guidelines.
- Sintoma (S): Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
- Medicamento (M): Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
- Dosagem (D): Dosage and frequency of medication administration.
- ProcedimentoMedico (PM): Anything healthcare professionals do related to patients, including exams, moving patients, administering something, or even surgeries.
- SinalVital (SV): Quantifiable indicators in a patient that can be measured, always associated with a specific result. Examples include cholesterol levels, diuresis, weight, or glycaemia.
- Resultado (R): Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
- Progresso (P): Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability.
MediAlbertina PT-PT 1.5B NER achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
Checkpoints | P | R | F1 |
---|---|---|---|
Albertina PT-PT 900M | 0.814 | 0.814 | 0.813 |
Albertina PT-PT 1.5B | 0.833 | 0.845 | 0.838 |
MediAlbertina PT-PT900M | 0.84 | 0.828 | 0.832 |
MediAlbertina PT-PT 1.5B | 0.842 | 0.845 | 0.843 |
Data
MediAlbertina PT-PT 1.5B NER was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence.
How to use
from transformers import pipeline
ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_1.5b_NER', aggregation_strategy='average')
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
entities = ner_pipeline(sentence)
for entity in entities:
print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
Citation
MediAlbertina is developed by a joint team from ISCTE-IUL, Portugal, and Select Data, CA USA. For a fully detailed description, check the respective publication:
In publishing process. Reference will be added soon.
Please use the above cannonical reference when using or citing this model.