augusnunes
commited on
Commit
β’
8cc9068
1
Parent(s):
265def2
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,9 @@ LegalBERTPT-br is a trained sentence embedding using SimCSE, a contrastive learn
|
|
13 |
# Corpora
|
14 |
|
15 |
β From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we used the column `Conteudo` with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias.
|
|
|
16 |
β From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we also used 147,008 bills. From these projects, we used the summary field named `txtEmenta` and the project core text named `txtExplicacaoEmenta`.
|
|
|
17 |
β From Political Speeches, we used 462,831 texts, specifically, we used the columns: `sumario`, `textodiscurso`, and `indexacao`.
|
18 |
|
19 |
These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.
|
|
|
13 |
# Corpora
|
14 |
|
15 |
β From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we used the column `Conteudo` with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias.
|
16 |
+
|
17 |
β From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we also used 147,008 bills. From these projects, we used the summary field named `txtEmenta` and the project core text named `txtExplicacaoEmenta`.
|
18 |
+
|
19 |
β From Political Speeches, we used 462,831 texts, specifically, we used the columns: `sumario`, `textodiscurso`, and `indexacao`.
|
20 |
|
21 |
These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.
|