|
--- |
|
language: "pt" |
|
widget: |
|
- text: "O paciente recebeu no hospital e falou com a médica" |
|
- text: "COMO ESQUEMA DE MEDICAÇÃO PARA ICC PRESCRITO NO ALTA, RECEBE FUROSEMIDA 40 BID, ISOSSORBIDA 40 TID, DIGOXINA 0,25 /D, CAPTOPRIL 50 TID E ESPIRONOLACTONA 25 /D." |
|
- text: "ESTAVA EM USO DE FUROSEMIDA 40 BID, DIGOXINA 0,25 /D, SINVASTATINA 40 /NOITE, CAPTOPRIL 50 TID, ISOSSORBIDA 20 TID, AAS 100 /D E ESPIRONOLACTONA 25 /D." |
|
datasets: |
|
- MacMorpho |
|
--- |
|
|
|
# POS-Tagger Bio Portuguese |
|
|
|
We fine-tuned the BioBERTpt(all) model with the MacMorpho corpus for the Post-Tagger task, with 10 epochs, achieving a general F1-Score of 0.9818. |
|
|
|
Metrics: |
|
|
|
``` |
|
Precision Recall F1 Suport |
|
accuracy 0.98 38320 |
|
macro avg 0.95 0.94 0.94 38320 |
|
weighted avg 0.98 0.98 0.98 38320 |
|
|
|
F1: 0.9818 Accuracy: 0.9818 |
|
``` |
|
|
|
Parameters: |
|
|
|
``` |
|
nclasses = 27 |
|
nepochs_total = 30 |
|
nepochs_stop = 12 (stop in 12th because early stop) |
|
batch_size = 32 |
|
batch_status = 32 |
|
learning_rate = 1e-5 |
|
early_stop = 3 |
|
max_length = 200 |
|
``` |
|
|
|
## Acknowledgements |
|
|
|
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. |
|
|
|
## Citation |
|
``` |
|
@article{Schneider_Gumiel_Oliveira_Montenegro_Barzotto_Moro_Pagano_Paraiso_2023, |
|
place={Brasil}, |
|
title={Developing a Transformer-based Clinical Part-of-Speech Tagger for Brazilian Portuguese}, |
|
volume={15}, |
|
url={https://jhi.sbis.org.br/index.php/jhi-sbis/article/view/1086}, |
|
DOI={10.59681/2175-4411.v15.iEspecial.2023.1086}, |
|
abstractNote={<p>Electronic Health Records are a valuable source of information to be extracted by means of natural language processing (NLP) tasks, such as morphosyntactic word tagging. Although there have been significant advances in health NLP, such as the Transformer architecture, languages such as Portuguese are still underrepresented. This paper presents taggers developed for Portuguese texts, fine-tuned using BioBERtpt (clinical/biomedical) and BERTimbau (generic) models on a POS-tagged corpus. We achieved an accuracy of 0.9826, state-of-the-art for the corpus used. In addition, we performed a human-based evaluation of the trained models and others in the literature, using authentic clinical narratives. Our clinical model achieved 0.8145 in accuracy compared to 0.7656 for the generic model. It also showed competitive results compared to models trained specifically with clinical texts, evidencing domain impact on the base model in NLP tasks.</p>}, |
|
number={Especial}, |
|
journal={Journal of Health Informatics}, |
|
author={Schneider, Elisa Terumi Rubel and Gumiel, Yohan Bonescki and Oliveira, Lucas Ferro Antunes de and Montenegro, Carolina de Oliveira and Barzotto, Laura Rubel and Moro, Claudia and Pagano, Adriana and Paraiso, Emerson Cabrera}, |
|
year={2023}, |
|
month={jul.} |
|
} |
|
``` |
|
|
|
## Questions? |
|
|
|
Please, post a Github issue on the [NLP Portuguese Chunking](https://github.com/HAILab-PUCPR/nlp-portuguese-chunking). |
|
|
|
|