TajaKuzman
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -121,7 +121,7 @@ If we do not filter out instances, predicted with lower confidence score, the mo
|
|
121 |
|
122 |
## Intended use and limitations
|
123 |
|
124 |
-
For reliable results, the classifier should be applied to documents of sufficient length (the rule of
|
125 |
|
126 |
|
127 |
Use example:
|
@@ -154,7 +154,8 @@ for result in results:
|
|
154 |
|
155 |
The classifier uses the top-level of the [IPTC Media Topic NewsCodes](https://iptc.org/std/NewsCodes/guidelines/#_what_are_the_iptc_newscodes) schema, consisting of 17 labels.
|
156 |
|
157 |
-
List of labels
|
|
|
158 |
```
|
159 |
labels_list=['education', 'human interest', 'society', 'sport', 'crime, law and justice',
|
160 |
'disaster, accident and emergency incident', 'arts, culture, entertainment and media', 'politics',
|
@@ -167,7 +168,7 @@ labels_map={0: 'education', 1: 'human interest', 2: 'society', 3: 'sport', 4: 'c
|
|
167 |
11: 'health', 12: 'labour', 13: 'religion', 14: 'weather', 15: 'environment', 16: 'conflict, war and peace'}
|
168 |
```
|
169 |
|
170 |
-
Description of labels
|
171 |
|
172 |
The descriptions of the labels are based on the descriptions provided in the [IPTC Media Topic NewsCodes schema](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html)
|
173 |
and enriched with information which specific subtopics belong to the top-level topics, based on the IPTC Media Topic label hierarchy.
|
@@ -250,7 +251,7 @@ When we remove instances predicted with lower confidence (229 instances - 20%),
|
|
250 |
| Slovenian | 0.835443 | 0.783873 | 237 |
|
251 |
| Greek | 0.84188 | 0.785525 | 234 |
|
252 |
|
253 |
-
|
254 |
|
255 |
Fine-tuning was performed with `simpletransformers`.
|
256 |
Beforehand, a brief hyperparameter optimization was performed and the presumed optimal hyperparameters are:
|
|
|
121 |
|
122 |
## Intended use and limitations
|
123 |
|
124 |
+
For reliable results, the classifier should be applied to documents of sufficient length (the rule of thumb is at least 75 words).
|
125 |
|
126 |
|
127 |
Use example:
|
|
|
154 |
|
155 |
The classifier uses the top-level of the [IPTC Media Topic NewsCodes](https://iptc.org/std/NewsCodes/guidelines/#_what_are_the_iptc_newscodes) schema, consisting of 17 labels.
|
156 |
|
157 |
+
### List of labels
|
158 |
+
|
159 |
```
|
160 |
labels_list=['education', 'human interest', 'society', 'sport', 'crime, law and justice',
|
161 |
'disaster, accident and emergency incident', 'arts, culture, entertainment and media', 'politics',
|
|
|
168 |
11: 'health', 12: 'labour', 13: 'religion', 14: 'weather', 15: 'environment', 16: 'conflict, war and peace'}
|
169 |
```
|
170 |
|
171 |
+
### Description of labels
|
172 |
|
173 |
The descriptions of the labels are based on the descriptions provided in the [IPTC Media Topic NewsCodes schema](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html)
|
174 |
and enriched with information which specific subtopics belong to the top-level topics, based on the IPTC Media Topic label hierarchy.
|
|
|
251 |
| Slovenian | 0.835443 | 0.783873 | 237 |
|
252 |
| Greek | 0.84188 | 0.785525 | 234 |
|
253 |
|
254 |
+
## Fine-tuning hyperparameters
|
255 |
|
256 |
Fine-tuning was performed with `simpletransformers`.
|
257 |
Beforehand, a brief hyperparameter optimization was performed and the presumed optimal hyperparameters are:
|