Update README.md
Browse files
README.md
CHANGED
@@ -108,8 +108,7 @@ tags:
|
|
108 |
# Multilingual IPTC Media Topic Classifier
|
109 |
|
110 |
Text classification model based on [`xlm-roberta-large`](https://huggingface.co/FacebookAI/xlm-roberta-large)
|
111 |
-
and fine-tuned on a news corpus in 4 languages (Croatian, Slovenian, Catalan and Greek),
|
112 |
-
model with the [top-level IPTC
|
113 |
Media Topic NewsCodes labels](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html).
|
114 |
|
115 |
The model can be used for classification into topic labels from the
|
@@ -198,7 +197,7 @@ and enriched with information which specific subtopics belong to the top-level t
|
|
198 |
|
199 |
The model was fine-tuned on a training dataset consisting of 15,000 news in four languages (Croatian, Slovenian, Catalan and Greek).
|
200 |
The news texts were extracted from the [MaCoCu web corpora](https://macocu.eu/) based on the "News" genre label, predicted with the [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier).
|
201 |
-
The training dataset was automatically annotated with the IPTC Media Topic labels by the GPT-4o model (with prediction accuracy of 0.78 and macro-F1 scores of 0.72).
|
202 |
|
203 |
Label distribution in the training dataset:
|
204 |
|
|
|
108 |
# Multilingual IPTC Media Topic Classifier
|
109 |
|
110 |
Text classification model based on [`xlm-roberta-large`](https://huggingface.co/FacebookAI/xlm-roberta-large)
|
111 |
+
and fine-tuned on a news corpus in 4 languages (Croatian, Slovenian, Catalan and Greek), annotated with the [top-level IPTC
|
|
|
112 |
Media Topic NewsCodes labels](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html).
|
113 |
|
114 |
The model can be used for classification into topic labels from the
|
|
|
197 |
|
198 |
The model was fine-tuned on a training dataset consisting of 15,000 news in four languages (Croatian, Slovenian, Catalan and Greek).
|
199 |
The news texts were extracted from the [MaCoCu web corpora](https://macocu.eu/) based on the "News" genre label, predicted with the [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier).
|
200 |
+
The training dataset was automatically annotated with the IPTC Media Topic labels by the [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) model (with prediction accuracy of 0.78 and macro-F1 scores of 0.72).
|
201 |
|
202 |
Label distribution in the training dataset:
|
203 |
|