ulysses-camara
/

legal-bert-pt-br

sentence-transformers

Model card Files Files and versions Community

augusnunes commited on Jun 3, 2022

Commit

8cc9068

·

1 Parent(s): 265def2

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -13,7 +13,9 @@ LegalBERTPT-br is a trained sentence embedding using SimCSE, a contrastive learn
 # Corpora
 – From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we used the column `Conteudo` with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias.
 – From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we also used 147,008 bills. From these projects, we used the summary field named `txtEmenta` and the project core text named `txtExplicacaoEmenta`.
 – From Political Speeches, we used 462,831 texts, specifically, we used the columns: `sumario`, `textodiscurso`, and `indexacao`.
 These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.

 # Corpora
 – From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we used the column `Conteudo` with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias.
 – From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we also used 147,008 bills. From these projects, we used the summary field named `txtEmenta` and the project core text named `txtExplicacaoEmenta`.
 – From Political Speeches, we used 462,831 texts, specifically, we used the columns: `sumario`, `textodiscurso`, and `indexacao`.
 These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.