miguel6nunes
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,99 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
|
4 |
+
inference:
|
5 |
+
parameters:
|
6 |
+
aggregation_strategy: "average"
|
7 |
+
|
8 |
+
language:
|
9 |
+
- pt
|
10 |
+
pipeline_tag: token-classification
|
11 |
+
tags:
|
12 |
+
- medialbertina-ptpt
|
13 |
+
- deberta
|
14 |
+
- portuguese
|
15 |
+
- european portuguese
|
16 |
+
- medical
|
17 |
+
- clinical
|
18 |
+
- healthcare
|
19 |
+
- NER
|
20 |
+
- Named Entity Recognition
|
21 |
+
- IE
|
22 |
+
- Information Extraction
|
23 |
+
widget:
|
24 |
+
- text: Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os sinais vitais do utente, incluindo a pressão arterial, com leitura de 120/87 mmHg e a frequência cardíaca, de 80 batimentos por minuto, foram monitorizados. Após a cirurgia o utente apresentava dor intensa no local e inchaço no tornozelo, mas os resultados da radiografia revelaram uma recuperação satisfatória. Foi prescrito ibuprofeno 600mg de 8 em 8 horas durante 3 dias.
|
25 |
+
example_title: Example 1
|
26 |
+
- text: Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.
|
27 |
+
example_title: Example 2
|
28 |
+
- text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
|
29 |
+
example_title: Example 3
|
30 |
+
- text: Após as sessões de fisioterapia o paciente apresenta recuperação de mobilidade.
|
31 |
+
example_title: Example 4
|
32 |
+
- text: O paciente está em Quimioterapia com uma dosagem específica de Cisplatina para o tratamento do cancro do pulmão.
|
33 |
+
example_title: Example 5
|
34 |
+
- text: Monitorização da Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
|
35 |
+
example_title: Example 6
|
36 |
+
- text: A ressonância magnética da utente revelou uma rotura no menisco lateral do joelho.
|
37 |
+
example_title: Example 7
|
38 |
+
- text: A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
|
39 |
+
example_title: Example 8
|
40 |
+
---
|
41 |
+
|
42 |
+
# MediAlbertina
|
43 |
+
The first publicly available medical language model trained with real European Portuguese data.
|
44 |
+
|
45 |
+
MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
|
46 |
+
|
47 |
+
Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_1.5b_NER/blob/main/LICENSE).
|
48 |
+
|
49 |
+
|
50 |
+
|
51 |
+
# Model Description
|
52 |
+
|
53 |
+
**MediAlbertina PT-PT 1.5 NER** was created through fine-tuning of [MediAlbertina PT-PT 1.5B](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_1.5b) on real European Portuguese EMRs that have been hand-annotated for the following entities:
|
54 |
+
- **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
|
55 |
+
- **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
|
56 |
+
- **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
|
57 |
+
- **Dosagem (D)**: Dosage and frequency of medication administration.
|
58 |
+
- **ProcedimentoMedico (PM)**: Anything healthcare professionals do related to patients, including exams, moving patients, administering something, or even surgeries.
|
59 |
+
- **SinalVital (SV)**: Quantifiable indicators in a patient that can be measured, always associated with a specific result. Examples include cholesterol levels, diuresis, weight, or glycaemia.
|
60 |
+
- **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
|
61 |
+
- **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability.
|
62 |
+
|
63 |
+
**MediAlbertina PT-PT 1.5B NER** achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
|
64 |
+
|
65 |
+
| Checkpoints | P | R | F1 |
|
66 |
+
|-----------------------|--------|--------|--------|
|
67 |
+
| Albertina PT-PT 900M | 0.814 | 0.814 | 0.813 |
|
68 |
+
| Albertina PT-PT 1.5B | 0.833 | **0.845** | 0.838 |
|
69 |
+
| MediAlbertina PT-PT900M| 0.84 | 0.828 | 0.832 |
|
70 |
+
| **MediAlbertina PT-PT 1.5B**| **0.842** | **0.845** | **0.843** |
|
71 |
+
|
72 |
+
|
73 |
+
|
74 |
+
|
75 |
+
## Data
|
76 |
+
|
77 |
+
**MediAlbertina PT-PT 1.5B NER** was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).
|
78 |
+
|
79 |
+
|
80 |
+
## How to use
|
81 |
+
|
82 |
+
```Python
|
83 |
+
from transformers import pipeline
|
84 |
+
|
85 |
+
ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_1.5b_NER', aggregation_strategy='average')
|
86 |
+
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
|
87 |
+
entities = ner_pipeline(sentence)
|
88 |
+
for entity in entities:
|
89 |
+
print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
|
90 |
+
```
|
91 |
+
|
92 |
+
## Citation
|
93 |
+
|
94 |
+
MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iul.pt/), Portugal, and [Select Data](https://selectdata.com/), CA USA. For a fully detailed description, check the respective publication:
|
95 |
+
|
96 |
+
```latex
|
97 |
+
In publishing process. Reference will be added soon.
|
98 |
+
```
|
99 |
+
Please use the above cannonical reference when using or citing this model.
|