TajaKuzman
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -134,7 +134,7 @@ The model can be used for classification into topic labels from the
|
|
134 |
applied to any news text in a language, supported by the `xlm-roberta-large`.
|
135 |
|
136 |
Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
|
137 |
-
the model achieves
|
138 |
and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
|
139 |
If we use only labels that are predicted with a confidence score equal or higher than 0.90,
|
140 |
the model achieves micro-F1 and macro-F1 of 0.80.
|
@@ -248,37 +248,39 @@ The model was evaluated on a manually-annotated test set in four languages (Croa
|
|
248 |
consisting of 1,129 instances.
|
249 |
The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
|
250 |
|
251 |
-
The model was shown to achieve micro-F1 score of 0.
|
252 |
|
253 |
| | Micro-F1 | Macro-F1 | Accuracy | No. of instances |
|
254 |
|:---|-----------:|-----------:|-----------:|-----------:|
|
255 |
-
| All (combined) | 0.
|
256 |
-
| Croatian |
|
257 |
| Catalan | 0.715356 | 0.722304 | 0.715356 | 267 |
|
258 |
| Slovenian | 0.758865 | 0.764784 | 0.758865 | 282 |
|
259 |
-
| Greek |
|
|
|
260 |
|
261 |
Performance per label:
|
262 |
|
263 |
-
| | precision | recall | f1-score |
|
264 |
-
|
265 |
-
| arts, culture, entertainment and media |
|
266 |
-
| conflict, war and peace |
|
267 |
-
| crime, law and justice |
|
268 |
-
| disaster, accident and emergency incident |
|
269 |
-
| economy, business and finance |
|
270 |
-
| education |
|
271 |
-
| environment |
|
272 |
-
| health |
|
273 |
-
| human interest |
|
274 |
-
| labour |
|
275 |
-
| lifestyle and leisure |
|
276 |
-
| politics |
|
277 |
-
| religion |
|
278 |
-
| science and technology |
|
279 |
-
| society |
|
280 |
-
| sport |
|
281 |
-
| weather |
|
|
|
282 |
|
283 |
For downstream tasks, **we advise you to use only labels that were predicted with confidence score
|
284 |
higher or equal to 0.90 which further improves the performance**.
|
|
|
134 |
applied to any news text in a language, supported by the `xlm-roberta-large`.
|
135 |
|
136 |
Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
|
137 |
+
the model achieves macro-F1 score of 0.746, micro-F1 score of 0.734, and accuracy of 0.734,
|
138 |
and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
|
139 |
If we use only labels that are predicted with a confidence score equal or higher than 0.90,
|
140 |
the model achieves micro-F1 and macro-F1 of 0.80.
|
|
|
248 |
consisting of 1,129 instances.
|
249 |
The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
|
250 |
|
251 |
+
The model was shown to achieve micro-F1 score of 0.734, and macro-F1 score of 0.746. The results for the entire test set and per language:
|
252 |
|
253 |
| | Micro-F1 | Macro-F1 | Accuracy | No. of instances |
|
254 |
|:---|-----------:|-----------:|-----------:|-----------:|
|
255 |
+
| All (combined) | 0.734278 | 0.745864 | 0.734278 | 1129 |
|
256 |
+
| Croatian | 0.728522 | 0.733725 | 0.728522 | 291 |
|
257 |
| Catalan | 0.715356 | 0.722304 | 0.715356 | 267 |
|
258 |
| Slovenian | 0.758865 | 0.764784 | 0.758865 | 282 |
|
259 |
+
| Greek | 0.733564 | 0.747129 | 0.733564 | 289 |
|
260 |
+
|
261 |
|
262 |
Performance per label:
|
263 |
|
264 |
+
| | precision | recall | f1-score | support |
|
265 |
+
|:------------------------------------------|------------:|---------:|-----------:|------------:|
|
266 |
+
| arts, culture, entertainment and media | 0.602151 | 0.875 | 0.713376 | 64 |
|
267 |
+
| conflict, war and peace | 0.611111 | 0.916667 | 0.733333 | 36 |
|
268 |
+
| crime, law and justice | 0.861538 | 0.811594 | 0.835821 | 69 |
|
269 |
+
| disaster, accident and emergency incident | 0.691176 | 0.886792 | 0.77686 | 53 |
|
270 |
+
| economy, business and finance | 0.779221 | 0.508475 | 0.615385 | 118 |
|
271 |
+
| education | 0.847458 | 0.735294 | 0.787402 | 68 |
|
272 |
+
| environment | 0.589041 | 0.754386 | 0.661538 | 57 |
|
273 |
+
| health | 0.79661 | 0.79661 | 0.79661 | 59 |
|
274 |
+
| human interest | 0.552239 | 0.672727 | 0.606557 | 55 |
|
275 |
+
| labour | 0.855072 | 0.830986 | 0.842857 | 71 |
|
276 |
+
| lifestyle and leisure | 0.773585 | 0.476744 | 0.589928 | 86 |
|
277 |
+
| politics | 0.568182 | 0.735294 | 0.641026 | 68 |
|
278 |
+
| religion | 0.842105 | 0.941176 | 0.888889 | 51 |
|
279 |
+
| science and technology | 0.637681 | 0.8 | 0.709677 | 55 |
|
280 |
+
| society | 0.918033 | 0.5 | 0.647399 | 112 |
|
281 |
+
| sport | 0.824324 | 0.968254 | 0.890511 | 63 |
|
282 |
+
| weather | 0.953488 | 0.931818 | 0.942529 | 44 |
|
283 |
+
|
284 |
|
285 |
For downstream tasks, **we advise you to use only labels that were predicted with confidence score
|
286 |
higher or equal to 0.90 which further improves the performance**.
|