|
--- |
|
language: pt |
|
license: mit |
|
tags: |
|
- sentence-transformers |
|
--- |
|
|
|
# LegalBERTPT-br |
|
|
|
LegalBERTPT-br is a trained sentence embedding using SimCSE, a contrastive learning framework, coupled with the Portuguese pre-trained language model named [BERTimbau](https://huggingface.co/neuralmind/bert-base-portuguese-cased). |
|
|
|
|
|
# Corpora |
|
|
|
– From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we used the column `Conteudo` with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias. |
|
|
|
– From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we also used 147,008 bills. From these projects, we used the summary field named `txtEmenta` and the project core text named `txtExplicacaoEmenta`. |
|
|
|
– From Political Speeches, we used 462,831 texts, specifically, we used the columns: `sumario`, `textodiscurso`, and `indexacao`. |
|
|
|
These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences. |
|
|
|
|
|
# Citing and Authors |
|
|
|
If you find this model helpful, feel free to cite our publication [Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_8): |
|
```bibtex |
|
@inproceedings{bracis, |
|
author = {Nádia Silva and Marília Silva and Fabíola Pereira and João Tarrega and João Beinotti and Márcio Fonseca and Francisco Andrade and André Carvalho}, |
|
title = {Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies}, |
|
booktitle = {Anais da X Brazilian Conference on Intelligent Systems}, |
|
location = {Online}, |
|
year = {2021}, |
|
keywords = {}, |
|
issn = {0000-0000}, |
|
publisher = {SBC}, |
|
address = {Porto Alegre, RS, Brasil}, |
|
url = {https://sol.sbc.org.br/index.php/bracis/article/view/19061} |
|
} |
|
``` |