TajaKuzman commited on
Commit
8c99438
·
verified ·
1 Parent(s): 26db48a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -25
README.md CHANGED
@@ -134,8 +134,9 @@ The model can be used for classification into topic labels from the
134
  applied to any news text in a language, supported by the `xlm-roberta-large`.
135
 
136
  Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
137
- the model achieves micro-F1 score of 0.734, macro-F1 score of 0.746 and accuracy of 0.734,
138
- and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
 
139
 
140
  ## Intended use and limitations
141
 
@@ -241,33 +242,54 @@ Label distribution in the training dataset:
241
 
242
  ## Performance
243
 
244
- The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek), consisting of 1,130 instances.
 
245
  The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
246
 
247
- The model was shown to achieve accuracy of 0.78 and macro-F1 scores of 0.72. The results for the entire test set and per language:
248
-
249
- | Language | Accuracy | Macro-F1 | No. of instances |
250
- |:-------|-----------:|-----------:|-----------:|
251
- | All (combined) | 0.784071 | 0.723079 | 1130 |
252
- | | | | |
253
- | Croatian | 0.786942 | 0.732721 | 291 |
254
- | Catalan | 0.752809 | 0.676812 | 267 |
255
- | Slovenian | 0.80212 | 0.736939 | 283 |
256
- | Greek | 0.792388 | 0.725062 | 289 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
257
 
258
  For downstream tasks, **we advise you to use only labels that were predicted with confidence score
259
- higher than 0.90 which further improves the performance**.
260
-
261
- When we remove instances predicted with lower confidence (229 instances - 20%), the scores are the following:
262
-
263
- | Language | Accuracy | Macro-F1 | No. of instances |
264
- |:-------|-----------:|-----------:|-----------:|
265
- | All (combined) | 0.835738 | 0.778166 | 901 |
266
- | | | | |
267
- | Croatian | 0.82906 | 0.767518 | 234 |
268
- | Catalan | 0.836735 | 0.75111 | 196 |
269
- | Slovenian | 0.835443 | 0.783873 | 237 |
270
- | Greek | 0.84188 | 0.785525 | 234 |
271
 
272
  ## Fine-tuning hyperparameters
273
 
 
134
  applied to any news text in a language, supported by the `xlm-roberta-large`.
135
 
136
  Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
137
+ the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
138
+ and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
139
+ If we use only labels that are predicted with a confidence score equal or higher than 0.90, the model achieves micro-F1 and macro-F1 of 0.80.
140
 
141
  ## Intended use and limitations
142
 
 
242
 
243
  ## Performance
244
 
245
+ The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek),
246
+ consisting of 1,129 instances.
247
  The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
248
 
249
+ The model was shown to achieve micro-F1 score of 0.733, and macro-F1 score of 0.745. The results for the entire test set and per language:
250
+
251
+ | | Micro-F1 | Macro-F1 | Accuracy | No. of instances |
252
+ |:---|-----------:|-----------:|-----------:|-----------:|
253
+ | All (combined) | 0.733392 | 0.744633 | 0.733392 | 1129 |
254
+ | Croatian | 0.728522 | 0.733725 | 0.728522 | 291 |
255
+ | Catalan | 0.715356 | 0.722304 | 0.715356 | 267 |
256
+ | Slovenian | 0.758865 | 0.764784 | 0.758865 | 282 |
257
+ | Greek | 0.730104 | 0.742099 | 0.730104 | 289 |
258
+
259
+ Performance per label:
260
+
261
+ | | precision | recall | f1-score | support |
262
+ |:------------------------------------------|------------:|---------:|-----------:|----------:|
263
+ | arts, culture, entertainment and media | 0.602 | 0.875 | 0.713 | 64 |
264
+ | conflict, war and peace | 0.611 | 0.917 | 0.733 | 36 |
265
+ | crime, law and justice | 0.862 | 0.812 | 0.836 | 69 |
266
+ | disaster, accident and emergency incident | 0.691 | 0.887 | 0.777 | 53 |
267
+ | economy, business and finance | 0.779 | 0.508 | 0.615 | 118 |
268
+ | education | 0.847 | 0.735 | 0.787 | 68 |
269
+ | environment | 0.589 | 0.754 | 0.662 | 57 |
270
+ | health | 0.797 | 0.797 | 0.797 | 59 |
271
+ | human interest | 0.552 | 0.673 | 0.607 | 55 |
272
+ | labour | 0.855 | 0.831 | 0.843 | 71 |
273
+ | lifestyle and leisure | 0.769 | 0.465 | 0.58 | 86 |
274
+ | politics | 0.568 | 0.735 | 0.641 | 68 |
275
+ | religion | 0.842 | 0.941 | 0.889 | 51 |
276
+ | science and technology | 0.638 | 0.8 | 0.71 | 55 |
277
+ | society | 0.918 | 0.5 | 0.647 | 112 |
278
+ | sport | 0.824 | 0.968 | 0.891 | 63 |
279
+ | weather | 0.932 | 0.932 | 0.932 | 44 |
280
 
281
  For downstream tasks, **we advise you to use only labels that were predicted with confidence score
282
+ higher or equal to 0.90 which further improves the performance**.
283
+
284
+ When we remove instances predicted with lower confidence (229 instances - 20%), the model yields micro-F1 of 0.798 and macro-F1 of 0.80.
285
+
286
+ | | Micro-F1 | Macro-F1 | Accuracy |
287
+ |:---|-----------:|-----------:|-----------:|
288
+ | All (combined) | 0.797777 | 0.802403 | 0.797777 |
289
+ | Croatian | 0.773504 | 0.772084 | 0.773504 |
290
+ | Catalan | 0.811224 | 0.806885 | 0.811224 |
291
+ | Slovenian | 0.805085 | 0.804491 | 0.805085 |
292
+ | Greek | 0.803419 | 0.809598 | 0.803419 |
 
293
 
294
  ## Fine-tuning hyperparameters
295