TajaKuzman commited on
Commit
ad2fac9
·
verified ·
1 Parent(s): 8c99438

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -136,7 +136,8 @@ The model can be used for classification into topic labels from the
136
  Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
137
  the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
138
  and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
139
- If we use only labels that are predicted with a confidence score equal or higher than 0.90, the model achieves micro-F1 and macro-F1 of 0.80.
 
140
 
141
  ## Intended use and limitations
142
 
@@ -216,7 +217,8 @@ and enriched with information which specific subtopics belong to the top-level t
216
 
217
  The model was fine-tuned on a training dataset consisting of 15,000 news in four languages (Croatian, Slovenian, Catalan and Greek).
218
  The news texts were extracted from the [MaCoCu web corpora](https://macocu.eu/) based on the "News" genre label, predicted with the [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier).
219
- The training dataset was automatically annotated with the IPTC Media Topic labels by the [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) model (with prediction accuracy of 0.78 and macro-F1 scores of 0.72).
 
220
 
221
  Label distribution in the training dataset:
222
 
 
136
  Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
137
  the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
138
  and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
139
+ If we use only labels that are predicted with a confidence score equal or higher than 0.90,
140
+ the model achieves micro-F1 and macro-F1 of 0.80.
141
 
142
  ## Intended use and limitations
143
 
 
217
 
218
  The model was fine-tuned on a training dataset consisting of 15,000 news in four languages (Croatian, Slovenian, Catalan and Greek).
219
  The news texts were extracted from the [MaCoCu web corpora](https://macocu.eu/) based on the "News" genre label, predicted with the [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier).
220
+ The training dataset was automatically annotated with the IPTC Media Topic labels by
221
+ the [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) model (yielding 0.72 micro-F1 and 0.73 macro-F1 on the test dataset).
222
 
223
  Label distribution in the training dataset:
224