classla
/

multilingual-IPTC-news-topic-classifier

@@ -144,11 +144,15 @@ results = classifier(texts)
 for result in results:
     print(result)
 ```
 ## IPTC Media Topic categories
-The classifier uses the top-level of the IPTC Media Topic NewsCodes schema, consisting of 17 labels.
 List of labels:
 ```
@@ -166,7 +170,7 @@ labels_map={0: 'education', 1: 'human interest', 2: 'society', 3: 'sport', 4: 'c
 Description of labels:
 The descriptions of the labels are based on the descriptions provided in the [IPTC Media Topic NewsCodes schema](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html)
-and enriched with information which specific subtopics belong to the top-level topics, based on the IPTC Media Topic hierarchy.
 | Label                                     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
 |:------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -218,8 +222,8 @@ Label distribution in the training dataset:
 ## Performance
-The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek), consisting of 1.130 instances.
-The test set contains equal amounts of texts from the four languages and is more or less balanced across labels.
 The model was shown to achieve accuracy of 0.78	and macro-F1 scores of 0.72. The results for the entire test set and per language:
@@ -235,7 +239,7 @@ The model was shown to achieve accuracy of 0.78	and macro-F1 scores of 0.72. The
 For downstream tasks, **we advise you to use only labels that were predicted with confidence score
 higher than 0.90 which further improves the performance**.
-When we remove instances, predicted with lower confidence from the test set (229 instances - 20%), the scores are the following:
 | Language   |   Accuracy |   Macro-F1 | No. of instances |
 |:-------|-----------:|-----------:|-----------:|

 for result in results:
     print(result)
+## Output
+## {'label': 'sport', 'score': 0.9985264539718628}
+## {'label': 'disaster, accident and emergency incident', 'score': 0.9957459568977356}
 ```
 ## IPTC Media Topic categories
+The classifier uses the top-level of the [IPTC Media Topic NewsCodes](https://iptc.org/std/NewsCodes/guidelines/#_what_are_the_iptc_newscodes) schema, consisting of 17 labels.
 List of labels:
 ```
 Description of labels:
 The descriptions of the labels are based on the descriptions provided in the [IPTC Media Topic NewsCodes schema](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html)
+and enriched with information which specific subtopics belong to the top-level topics, based on the IPTC Media Topic label hierarchy.
 | Label                                     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
 |:------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 ## Performance
+The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek), consisting of 1,130 instances.
+The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
 The model was shown to achieve accuracy of 0.78	and macro-F1 scores of 0.72. The results for the entire test set and per language:
 For downstream tasks, **we advise you to use only labels that were predicted with confidence score
 higher than 0.90 which further improves the performance**.
+When we remove instances predicted with lower confidence (229 instances - 20%), the scores are the following:
 | Language   |   Accuracy |   Macro-F1 | No. of instances |
 |:-------|-----------:|-----------:|-----------:|