miguel6nunes commited on
Commit
16fedb3
·
verified ·
1 Parent(s): 07e8093

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -3
README.md CHANGED
@@ -1,3 +1,99 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+
4
+ inference:
5
+ parameters:
6
+ aggregation_strategy: "average"
7
+
8
+ language:
9
+ - pt
10
+ pipeline_tag: token-classification
11
+ tags:
12
+ - medialbertina-ptpt
13
+ - deberta
14
+ - portuguese
15
+ - european portuguese
16
+ - medical
17
+ - clinical
18
+ - healthcare
19
+ - NER
20
+ - Named Entity Recognition
21
+ - IE
22
+ - Information Extraction
23
+ widget:
24
+ - text: Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os sinais vitais do utente, incluindo a pressão arterial, com leitura de 120/87 mmHg e a frequência cardíaca, de 80 batimentos por minuto, foram monitorizados. Após a cirurgia o utente apresentava dor intensa no local e inchaço no tornozelo, mas os resultados da radiografia revelaram uma recuperação satisfatória. Foi prescrito ibuprofeno 600mg de 8 em 8 horas durante 3 dias.
25
+ example_title: Example 1
26
+ - text: Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.
27
+ example_title: Example 2
28
+ - text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
29
+ example_title: Example 3
30
+ - text: Após as sessões de fisioterapia o paciente apresenta recuperação de mobilidade.
31
+ example_title: Example 4
32
+ - text: O paciente está em Quimioterapia com uma dosagem específica de Cisplatina para o tratamento do cancro do pulmão.
33
+ example_title: Example 5
34
+ - text: Monitorização da Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
35
+ example_title: Example 6
36
+ - text: A ressonância magnética da utente revelou uma rotura no menisco lateral do joelho.
37
+ example_title: Example 7
38
+ - text: A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
39
+ example_title: Example 8
40
+ ---
41
+
42
+ # MediAlbertina
43
+ The first publicly available medical language model trained with real European Portuguese data.
44
+
45
+ MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
46
+
47
+ Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_1.5b_NER/blob/main/LICENSE).
48
+
49
+
50
+
51
+ # Model Description
52
+
53
+ **MediAlbertina PT-PT 1.5 NER** was created through fine-tuning of [MediAlbertina PT-PT 1.5B](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_1.5b) on real European Portuguese EMRs that have been hand-annotated for the following entities:
54
+ - **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
55
+ - **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
56
+ - **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
57
+ - **Dosagem (D)**: Dosage and frequency of medication administration.
58
+ - **ProcedimentoMedico (PM)**: Anything healthcare professionals do related to patients, including exams, moving patients, administering something, or even surgeries.
59
+ - **SinalVital (SV)**: Quantifiable indicators in a patient that can be measured, always associated with a specific result. Examples include cholesterol levels, diuresis, weight, or glycaemia.
60
+ - **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
61
+ - **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability.
62
+
63
+ **MediAlbertina PT-PT 1.5B NER** achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
64
+
65
+ | Checkpoints | P | R | F1 |
66
+ |-----------------------|--------|--------|--------|
67
+ | Albertina PT-PT 900M | 0.814 | 0.814 | 0.813 |
68
+ | Albertina PT-PT 1.5B | 0.833 | **0.845** | 0.838 |
69
+ | MediAlbertina PT-PT900M| 0.84 | 0.828 | 0.832 |
70
+ | **MediAlbertina PT-PT 1.5B**| **0.842** | **0.845** | **0.843** |
71
+
72
+
73
+
74
+
75
+ ## Data
76
+
77
+ **MediAlbertina PT-PT 1.5B NER** was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).
78
+
79
+
80
+ ## How to use
81
+
82
+ ```Python
83
+ from transformers import pipeline
84
+
85
+ ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_1.5b_NER', aggregation_strategy='average')
86
+ sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
87
+ entities = ner_pipeline(sentence)
88
+ for entity in entities:
89
+ print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
90
+ ```
91
+
92
+ ## Citation
93
+
94
+ MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iul.pt/), Portugal, and [Select Data](https://selectdata.com/), CA USA. For a fully detailed description, check the respective publication:
95
+
96
+ ```latex
97
+ In publishing process. Reference will be added soon.
98
+ ```
99
+ Please use the above cannonical reference when using or citing this model.