BERTikal (aka legalnlp-bert)

BERTikal [1] is a cased BERT-base model for the Brazilian legal language and was trained from the BERTimbau's [2] checkpoint using Brazilian legal texts. More details on the datasets and training procedures can be found in [1].

Please check Legal-NLP out for more resources on (PT-BR) legal natural language processing (https://github.com/felipemaiapolo/legalnlp).

Please cite as Polo, Felipe Maia, et al. "LegalNLP-Natural Language Processing methods for the Brazilian Legal Language." Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional. SBC, 2021.

  @inproceedings{polo2021legalnlp,
    title={LegalNLP-Natural Language Processing methods for the Brazilian Legal Language},
    author={Polo, Felipe Maia and Mendon{\c{c}}a, Gabriel Caiaffa Floriano and Parreira, Kau{\^e} Capellato J and Gianvechio, Lucka and Cordeiro, Peterson and Ferreira, Jonathan Batista and de Lima, Leticia Maria Paz and do Amaral Maia, Ant{\^o}nio Carlos and Vicente, Renato},
    booktitle={Anais do XVIII Encontro Nacional de Intelig{\^e}ncia Artificial e Computacional},
    pages={763--774},
    year={2021},
    organization={SBC}
}

Usage

Ex. Loading model for general use

from transformers import AutoTokenizer  # Or BertTokenizer
from transformers import AutoModelForPreTraining  # Or BertForPreTraining for loading pretraining heads
from transformers import AutoModel  # or BertModel, for BERT without pretraining heads

model = AutoModelForPreTraining.from_pretrained('felipemaiapolo/legalnlp-bert')
tokenizer = AutoTokenizer.from_pretrained('felipemaiapolo/legalnlp-bert', do_lower_case=False)

Ex. BERT embeddings

from transformers import pipeline

pipe = pipeline("feature-extraction", model='felipemaiapolo/legalnlp-bert')
encoded_sentence = pipe('Juíz negou o recurso.')

Ex. Masked language modeling prediction

from transformers import pipeline

pipe = pipeline('fill-mask', model='felipemaiapolo/legalnlp-bert')

pipe('Juíz negou o [MASK].')
#  [{'score': 0.6387444734573364,
#  'token': 7608,
#  'token_str': 'julgamento',
#  'sequence': 'juiz negou o julgamento.'},
# {'score': 0.09632532298564911,
#  'token': 7509,
#  'token_str': 'voto',
#  'sequence': 'juiz negou o voto.'},
# {'score': 0.06424401700496674,
#  'token': 17225,
#  'token_str': 'julgado',
#  'sequence': 'juiz negou o julgado.'},
# {'score': 0.05929475650191307,
#  'token': 8190,
#  'token_str': 'recurso',
#  'sequence': 'juiz negou o recurso.'},
# {'score': 0.011442390270531178,
#  'token': 6330,
#  'token_str': 'registro',
#  'sequence': 'juiz negou o registro.'}]

References

[1] Polo, Felipe Maia, et al. "LegalNLP-Natural Language Processing methods for the Brazilian Legal Language." Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional. SBC, 2021.

[2] Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23

Downloads last month
443
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.