|
--- |
|
language: "pt" |
|
widget: |
|
- text: "O principal [MASK] da COVID-19 é tosse seca." |
|
- text: "O vírus da gripe apresenta um [MASK] constituído por segmentos de ácido ribonucleico." |
|
|
|
datasets: |
|
- biomedical literature from Scielo and Pubmed |
|
thumbnail: "https://raw.githubusercontent.com/HAILab-PUCPR/BioBERTpt/master/logo-biobertpr1.png" |
|
--- |
|
|
|
<img src="https://raw.githubusercontent.com/HAILab-PUCPR/BioBERTpt/master/logo-biobertpr1.png" alt="Logo BioBERTpt"> |
|
|
|
# BioBERTpt - Portuguese Clinical and Biomedical BERT |
|
|
|
The [BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition](https://www.aclweb.org/anthology/2020.clinicalnlp-1.7/) paper contains clinical and biomedical BERT-based models for Portuguese Language, initialized with BERT-Multilingual-Cased & trained on clinical notes and biomedical literature. |
|
|
|
This model card describes the BioBERTpt(bio) model, a biomedical version of BioBERTpt, trained on Portuguese biomedical literature from scientific papers from Pubmed and Scielo. |
|
|
|
## How to use the model |
|
|
|
Load the model via the transformers library: |
|
``` |
|
from transformers import AutoTokenizer, AutoModel |
|
tokenizer = AutoTokenizer.from_pretrained("pucpr/biobertpt-bio") |
|
model = AutoModel.from_pretrained("pucpr/biobertpt-bio") |
|
``` |
|
|
|
## More Information |
|
|
|
Refer to the original paper, [BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition](https://www.aclweb.org/anthology/2020.clinicalnlp-1.7/) for additional details and performance on Portuguese NER tasks. |
|
|
|
## Questions? |
|
|
|
Post a Github issue on the [BioBERTpt repo](https://github.com/HAILab-PUCPR/BioBERTpt). |