TajaKuzman commited on
Commit
39952f5
·
verified ·
1 Parent(s): 822bd30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -144,11 +144,15 @@ results = classifier(texts)
144
  for result in results:
145
  print(result)
146
 
 
 
 
 
147
  ```
148
 
149
  ## IPTC Media Topic categories
150
 
151
- The classifier uses the top-level of the IPTC Media Topic NewsCodes schema, consisting of 17 labels.
152
 
153
  List of labels:
154
  ```
@@ -166,7 +170,7 @@ labels_map={0: 'education', 1: 'human interest', 2: 'society', 3: 'sport', 4: 'c
166
  Description of labels:
167
 
168
  The descriptions of the labels are based on the descriptions provided in the [IPTC Media Topic NewsCodes schema](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html)
169
- and enriched with information which specific subtopics belong to the top-level topics, based on the IPTC Media Topic hierarchy.
170
 
171
  | Label | Description |
172
  |:------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -218,8 +222,8 @@ Label distribution in the training dataset:
218
 
219
  ## Performance
220
 
221
- The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek), consisting of 1.130 instances.
222
- The test set contains equal amounts of texts from the four languages and is more or less balanced across labels.
223
 
224
  The model was shown to achieve accuracy of 0.78 and macro-F1 scores of 0.72. The results for the entire test set and per language:
225
 
@@ -235,7 +239,7 @@ The model was shown to achieve accuracy of 0.78 and macro-F1 scores of 0.72. The
235
  For downstream tasks, **we advise you to use only labels that were predicted with confidence score
236
  higher than 0.90 which further improves the performance**.
237
 
238
- When we remove instances, predicted with lower confidence from the test set (229 instances - 20%), the scores are the following:
239
 
240
  | Language | Accuracy | Macro-F1 | No. of instances |
241
  |:-------|-----------:|-----------:|-----------:|
 
144
  for result in results:
145
  print(result)
146
 
147
+ ## Output
148
+ ## {'label': 'sport', 'score': 0.9985264539718628}
149
+ ## {'label': 'disaster, accident and emergency incident', 'score': 0.9957459568977356}
150
+
151
  ```
152
 
153
  ## IPTC Media Topic categories
154
 
155
+ The classifier uses the top-level of the [IPTC Media Topic NewsCodes](https://iptc.org/std/NewsCodes/guidelines/#_what_are_the_iptc_newscodes) schema, consisting of 17 labels.
156
 
157
  List of labels:
158
  ```
 
170
  Description of labels:
171
 
172
  The descriptions of the labels are based on the descriptions provided in the [IPTC Media Topic NewsCodes schema](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html)
173
+ and enriched with information which specific subtopics belong to the top-level topics, based on the IPTC Media Topic label hierarchy.
174
 
175
  | Label | Description |
176
  |:------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 
222
 
223
  ## Performance
224
 
225
+ The model was evaluated on a manually-annotated test set in four languages (Croatian, Slovenian, Catalan and Greek), consisting of 1,130 instances.
226
+ The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
227
 
228
  The model was shown to achieve accuracy of 0.78 and macro-F1 scores of 0.72. The results for the entire test set and per language:
229
 
 
239
  For downstream tasks, **we advise you to use only labels that were predicted with confidence score
240
  higher than 0.90 which further improves the performance**.
241
 
242
+ When we remove instances predicted with lower confidence (229 instances - 20%), the scores are the following:
243
 
244
  | Language | Accuracy | Macro-F1 | No. of instances |
245
  |:-------|-----------:|-----------:|-----------:|