Update README.md
Browse files
README.md
CHANGED
@@ -134,8 +134,9 @@ The model can be used for classification into topic labels from the
|
|
134 |
applied to any news text in a language, supported by the `xlm-roberta-large`.
|
135 |
|
136 |
Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
|
137 |
-
the model achieves micro-F1 score of 0.
|
138 |
-
and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
|
|
|
139 |
|
140 |
## Intended use and limitations
|
141 |
|
@@ -241,33 +242,54 @@ Label distribution in the training dataset:
|
|
241 |
|
242 |
## Performance
|
243 |
|
244 |
-
The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek),
|
|
|
245 |
The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
|
246 |
|
247 |
-
The model was shown to achieve
|
248 |
-
|
249 |
-
|
|
250 |
-
|
251 |
-
| All (combined) | 0.
|
252 |
-
|
|
253 |
-
|
|
254 |
-
|
|
255 |
-
|
|
256 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
257 |
|
258 |
For downstream tasks, **we advise you to use only labels that were predicted with confidence score
|
259 |
-
higher
|
260 |
-
|
261 |
-
When we remove instances predicted with lower confidence (229 instances - 20%), the
|
262 |
-
|
263 |
-
|
|
264 |
-
|
265 |
-
| All (combined) | 0.
|
266 |
-
|
|
267 |
-
|
|
268 |
-
|
|
269 |
-
|
|
270 |
-
| Greek | 0.84188 | 0.785525 | 234 |
|
271 |
|
272 |
## Fine-tuning hyperparameters
|
273 |
|
|
|
134 |
applied to any news text in a language, supported by the `xlm-roberta-large`.
|
135 |
|
136 |
Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
|
137 |
+
the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
|
138 |
+
and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
|
139 |
+
If we use only labels that are predicted with a confidence score equal or higher than 0.90, the model achieves micro-F1 and macro-F1 of 0.80.
|
140 |
|
141 |
## Intended use and limitations
|
142 |
|
|
|
242 |
|
243 |
## Performance
|
244 |
|
245 |
+
The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek),
|
246 |
+
consisting of 1,129 instances.
|
247 |
The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
|
248 |
|
249 |
+
The model was shown to achieve micro-F1 score of 0.733, and macro-F1 score of 0.745. The results for the entire test set and per language:
|
250 |
+
|
251 |
+
| | Micro-F1 | Macro-F1 | Accuracy | No. of instances |
|
252 |
+
|:---|-----------:|-----------:|-----------:|-----------:|
|
253 |
+
| All (combined) | 0.733392 | 0.744633 | 0.733392 | 1129 |
|
254 |
+
| Croatian | 0.728522 | 0.733725 | 0.728522 | 291 |
|
255 |
+
| Catalan | 0.715356 | 0.722304 | 0.715356 | 267 |
|
256 |
+
| Slovenian | 0.758865 | 0.764784 | 0.758865 | 282 |
|
257 |
+
| Greek | 0.730104 | 0.742099 | 0.730104 | 289 |
|
258 |
+
|
259 |
+
Performance per label:
|
260 |
+
|
261 |
+
| | precision | recall | f1-score | support |
|
262 |
+
|:------------------------------------------|------------:|---------:|-----------:|----------:|
|
263 |
+
| arts, culture, entertainment and media | 0.602 | 0.875 | 0.713 | 64 |
|
264 |
+
| conflict, war and peace | 0.611 | 0.917 | 0.733 | 36 |
|
265 |
+
| crime, law and justice | 0.862 | 0.812 | 0.836 | 69 |
|
266 |
+
| disaster, accident and emergency incident | 0.691 | 0.887 | 0.777 | 53 |
|
267 |
+
| economy, business and finance | 0.779 | 0.508 | 0.615 | 118 |
|
268 |
+
| education | 0.847 | 0.735 | 0.787 | 68 |
|
269 |
+
| environment | 0.589 | 0.754 | 0.662 | 57 |
|
270 |
+
| health | 0.797 | 0.797 | 0.797 | 59 |
|
271 |
+
| human interest | 0.552 | 0.673 | 0.607 | 55 |
|
272 |
+
| labour | 0.855 | 0.831 | 0.843 | 71 |
|
273 |
+
| lifestyle and leisure | 0.769 | 0.465 | 0.58 | 86 |
|
274 |
+
| politics | 0.568 | 0.735 | 0.641 | 68 |
|
275 |
+
| religion | 0.842 | 0.941 | 0.889 | 51 |
|
276 |
+
| science and technology | 0.638 | 0.8 | 0.71 | 55 |
|
277 |
+
| society | 0.918 | 0.5 | 0.647 | 112 |
|
278 |
+
| sport | 0.824 | 0.968 | 0.891 | 63 |
|
279 |
+
| weather | 0.932 | 0.932 | 0.932 | 44 |
|
280 |
|
281 |
For downstream tasks, **we advise you to use only labels that were predicted with confidence score
|
282 |
+
higher or equal to 0.90 which further improves the performance**.
|
283 |
+
|
284 |
+
When we remove instances predicted with lower confidence (229 instances - 20%), the model yields micro-F1 of 0.798 and macro-F1 of 0.80.
|
285 |
+
|
286 |
+
| | Micro-F1 | Macro-F1 | Accuracy |
|
287 |
+
|:---|-----------:|-----------:|-----------:|
|
288 |
+
| All (combined) | 0.797777 | 0.802403 | 0.797777 |
|
289 |
+
| Croatian | 0.773504 | 0.772084 | 0.773504 |
|
290 |
+
| Catalan | 0.811224 | 0.806885 | 0.811224 |
|
291 |
+
| Slovenian | 0.805085 | 0.804491 | 0.805085 |
|
292 |
+
| Greek | 0.803419 | 0.809598 | 0.803419 |
|
|
|
293 |
|
294 |
## Fine-tuning hyperparameters
|
295 |
|