metadata

language: pt
widget:
  - text: O principal [MASK] da COVID-19 é tosse seca.
  - text: >-
      O vírus da gripe apresenta um [MASK] constituído por segmentos de ácido
      ribonucleico.
datasets:
  - biomedical literature from Scielo and Pubmed
thumbnail: >-
  https://raw.githubusercontent.com/HAILab-PUCPR/BioBERTpt/master/logo-biobertpr1.png

BioBERTpt - Portuguese Clinical and Biomedical BERT

The BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition paper contains clinical and biomedical BERT-based models for Portuguese Language, initialized with BERT-Multilingual-Cased & trained on clinical notes and biomedical literature.

This model card describes the BioBERTpt(bio) model, a biomedical version of BioBERTpt, trained on Portuguese biomedical literature from scientific papers from Pubmed and Scielo.

How to use the model

Load the model via the transformers library:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("pucpr/biobertpt-bio")
model = AutoModel.from_pretrained("pucpr/biobertpt-bio")

More Information

Refer to the original paper, BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition for additional details and performance on Portuguese NER tasks.

Questions?

Post a Github issue on the BioBERTpt repo.