hplisiecki
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ Our research utilizes a comprehensive database of Polish political texts from so
|
|
14 |
- YouTube: 42,252 comments
|
15 |
- Facebook: 414,595 posts
|
16 |
|
17 |
-
The texts were processed to fit transformer models' length constraints. Facebook texts were split into sentences, and all texts longer than 280 characters were removed. Non-Polish texts were filtered out using the `langdetect` software, and all online links and usernames were replaced with placeholders. We focused on texts with higher emotional content for training, resulting in a final dataset of 10,000 texts, annotated by 20 expert annotators.
|
18 |
|
19 |
### Annotation Process
|
20 |
|
|
|
14 |
- YouTube: 42,252 comments
|
15 |
- Facebook: 414,595 posts
|
16 |
|
17 |
+
The texts were processed to fit transformer models' length constraints. Facebook texts were split into sentences, and all texts longer than 280 characters were removed. Non-Polish texts were filtered out using the `langdetect` software, and all online links and usernames were replaced with placeholders. We focused on texts with higher emotional content for training, which we have filtered using a lexicon approach, resulting in a final dataset of 10,000 texts, annotated by 20 expert annotators.
|
18 |
|
19 |
### Annotation Process
|
20 |
|