File size: 6,772 Bytes
8c92fa8
 
805d893
 
 
 
42ea19e
 
 
a139cb0
42ea19e
 
 
 
 
 
 
 
 
 
 
 
 
ec1daec
42ea19e
 
 
 
 
 
 
 
 
 
 
a5854b9
42ea19e
 
a5854b9
8c92fa8
42ea19e
 
a5854b9
42ea19e
 
 
88ece8f
42ea19e
 
 
 
 
88ece8f
a5854b9
 
 
 
 
 
 
 
65b5b9c
88ece8f
a5854b9
 
 
 
070840b
 
a5854b9
 
 
42ea19e
 
 
 
88ece8f
42ea19e
 
 
 
 
 
 
88ece8f
42ea19e
 
 
a5854b9
42ea19e
 
 
 
 
 
 
d193fee
 
 
 
 
 
 
 
 
42ea19e
d193fee
147930a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: mit

inference:
  parameters:
    aggregation_strategy: "average"

language:
  - pt
pipeline_tag: token-classification
tags:
  - medialbertina-ptpt
  - deberta
  - portuguese
  - european portuguese
  - medical
  - clinical
  - healthcare
  - NER
  - Named Entity Recognition
  - IE
  - Information Extraction
widget:
  - text: Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os sinais vitais do utente, incluindo a pressão arterial, com leitura de 120/87 mmHg e a frequência cardíaca, de 80 batimentos por minuto, foram monitorizados. Após a cirurgia o utente apresentava  dor intensa no local e inchaço no tornozelo, mas os resultados da radiografia revelaram uma recuperação satisfatória. Foi prescrito ibuprofeno 600mg de 8 em 8 horas durante 3 dias.
    example_title: Example 1
  - text: Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.
    example_title: Example 2
  - text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
    example_title: Example 3
  - text: Após as sessões de fisioterapia o paciente apresenta recuperação de mobilidade.
    example_title: Example 4
  - text: O paciente está em Quimioterapia com uma dosagem específica de Cisplatina para o tratamento do cancro do pulmão.
    example_title: Example 5
  - text: Monitorização da  Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
    example_title: Example 6
  - text: A ressonância magnética da utente revelou uma rotura no menisco lateral do joelho.
    example_title: Example 7
  - text:  A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
    example_title: Example 8
---

# MediAlbertina
The first publicly available medical language model trained with real European Portuguese data.

MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.

Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m_NER/blob/main/LICENSE).



# Model Description

**MediAlbertina PT-PT 900M NER** was created through fine-tuning of [MediAlbertina PT-PT 900M](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m) on real European Portuguese EMRs that have been hand-annotated for the following entities:
- **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
- **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
- **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
- **Dosagem (D)**: Dosage and frequency of medication administration.
- **ProcedimentoMedico (PM)**: Anything healthcare professionals do related to patients, including exams, moving patients, administering something, or even surgeries.
- **SinalVital (SV)**: Quantifiable indicators in a patient that can be measured, always associated with a specific result. Examples include cholesterol levels, diuresis, weight, or glycaemia.
- **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
- **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability. 
  
**MediAlbertina PT-PT 900M NER** achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.

| Model                   | B-D | I-D | B-S | I-S | B-PM | I-PM | B-SV | I-SV | B-R | I-R | B-M | I-M | B-DO | I-DO | B-P | I-P | 
|-------------------------|:----:|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|                         | F1   | F1   | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  |
| albertina-900m-portuguese-ptpt-encoder|0.721|0.786|0.734|0.775|0.737|0.805|0.859|**0.811**|0.803|0.816|0.913|0.871|**0.853**|**0.895**|0.769|0.785|
| **medialbertina_pt-pt_900m** | **0.799**| **0.832**| **0.754**| **0.782**| **0.786**| **0.813**| **0.916**| 0.788| **0.821**| **0.83**| **0.926**| **0.895**|0.85|0.885| **0.779**| **0.807**|





## Data

**MediAlbertina PT-PT 900M NER** was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).


## How to use

```Python
from transformers import pipeline

ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER', aggregation_strategy='average')
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
entities = ner_pipeline(sentence)
for entity in entities:
    print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
```

## Citation

MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iul.pt/), Portugal, and [Select Data](https://selectdata.com/), CA USA. For a fully detailed description, check the respective publication:

```latex
@article{MediAlbertina PT-PT,
      title={MediAlbertina: An European Portuguese medical language model}, 
      author={Miguel Nunes and João Boné and João Ferreira
              and Pedro Chaves and Luís Elvas},
      year={2024},
      journal={CBM},
      volume={182}
      url={https://doi.org/10.1016/j.compbiomed.2024.109233}
}
```
Please use the above cannonical reference when using or citing this [model](https://www.sciencedirect.com/science/article/pii/S0010482524013180?via%3Dihub).

## Acknowledgements

This work was financially supported by Project Blockchain.PT – Decentralize Portugal with Blockchain Agenda, (Project no 51), WP2, Call no 02/C05-i01.01/2022, funded by the Portuguese Recovery and Resillience Program (PRR), The Portuguese Republic and The European Union (EU) under the framework of Next Generation EU Program.