tollefj's picture
Update README.md
a6a95d9 verified
---
library_name: transformers
tags: []
---
norbert3-small trained on wikiann (fo/is), sucx3 (se), dane (da) and norne (nb/nn)
added a custom clf head along with a character-level cnn for adding a tiny extra signal for the classification.
results:
```css
Eval on wikiann - fo
index 0
tokens [Byrta, -, Aftur, og, aftur]
ner_tags [3, 0, 0, 0, 0]
subset fo
dataset wikiann
Name: 0, dtype: object
shape: (100, 5)
100%
 5/5 [00:01<00:00,  3.92it/s]
Loss: 0.2276667356491089
O O
B-ORG B-ORG
B-ORG B-ORG
O O
O O
O O
O O
O O
O O
O O
Validation Loss: 0.26530784368515015
Validation Accuracy: 0.9228951181745751
precision recall f1-score support
LOC 0.86 0.81 0.83 154
ORG 0.67 0.73 0.70 125
PER 0.87 0.91 0.89 79
micro avg 0.79 0.80 0.80 358
macro avg 0.80 0.82 0.81 358
weighted avg 0.79 0.80 0.80 358
________________________________________
Eval on wikiann - is
index 100
tokens [Beltaþyrill, ''Ceryle, alcyon, '', Sjaldséð]
ner_tags [5, 0, 0, 0, 0]
subset is
dataset wikiann
Name: 0, dtype: object
shape: (1000, 5)
100%
 50/50 [00:10<00:00,  5.02it/s]
Loss: 0.22668001055717468
O O
B-LOC B-LOC
B-LOC B-LOC
B-LOC B-LOC
B-LOC B-LOC
B-LOC B-LOC
B-LOC B-LOC
O O
O O
O O
Validation Loss: 0.2526825902983546
Validation Accuracy: 0.9360383541181041
precision recall f1-score support
LOC 0.84 0.85 0.84 1983
ORG 0.81 0.80 0.80 1762
PER 0.89 0.89 0.89 1020
micro avg 0.84 0.84 0.84 4765
macro avg 0.84 0.85 0.85 4765
weighted avg 0.84 0.84 0.84 4765
________________________________________
Eval on dane - default
index 1100
tokens [To, kendte, russiske, historikere, Andronik, ...
ner_tags [0, 0, 7, 0, 1, 2, 0, 1, 2, 0, 0, 0, 0, 5, 0, ...
subset default
dataset dane
Name: 0, dtype: object
shape: (565, 5)
100%
 29/29 [00:06<00:00,  4.75it/s]
Loss: 0.12037135660648346
O O
O O
O O
O O
O O
B-MISC B-MISC
O O
O O
B-PER B-PER
B-PER B-PER
Validation Loss: 0.11113663488228259
Validation Accuracy: 0.972018408457994
precision recall f1-score support
LOC 0.78 0.86 0.82 225
MISC 0.72 0.52 0.61 333
ORG 0.72 0.69 0.71 379
PER 0.96 0.92 0.94 298
micro avg 0.80 0.73 0.76 1235
macro avg 0.80 0.75 0.77 1235
weighted avg 0.79 0.73 0.76 1235
________________________________________
Eval on norne - bokmaal-7
index 1665
tokens [Honnørordene, er, ", dristig, formspråk, ", ,...
ner_tags [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
subset bokmaal-7
dataset norne
Name: 0, dtype: object
shape: (1939, 5)
100%
 97/97 [00:20<00:00,  4.56it/s]
Loss: 0.0011819382198154926
O O
O O
O O
O O
O O
O O
O O
O O
O O
O O
Validation Loss: 0.04194018930858649
Validation Accuracy: 0.9876322465792248
precision recall f1-score support
LOC 0.85 0.90 0.87 498
MISC 0.81 0.74 0.78 363
ORG 0.77 0.83 0.80 499
PER 0.93 0.96 0.95 845
micro avg 0.86 0.88 0.87 2205
macro avg 0.84 0.86 0.85 2205
weighted avg 0.86 0.88 0.87 2205
________________________________________
Eval on norne - nynorsk-7
index 3604
tokens [Den, er, mettande, og, smakfull, ,, og, det, ...
ner_tags [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
subset nynorsk-7
dataset norne
Name: 0, dtype: object
shape: (1511, 5)
100%
 76/76 [00:15<00:00,  5.82it/s]
Loss: 0.0790824368596077
O O
O O
O O
O O
O O
O O
O O
O O
O O
O O
Validation Loss: 0.05325472676725583
Validation Accuracy: 0.9867293689853402
precision recall f1-score support
LOC 0.77 0.91 0.84 365
MISC 0.80 0.76 0.78 295
ORG 0.83 0.82 0.82 397
PER 0.98 0.95 0.97 664
micro avg 0.87 0.88 0.87 1721
macro avg 0.85 0.86 0.85 1721
weighted avg 0.87 0.88 0.87 1721
________________________________________
Eval on sucx3_ner - original_cased
index 5115
tokens [Just, i, dag, är, Saabs, företagsledning, där...
ner_tags [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
subset original_cased
dataset sucx3_ner
Name: 0, dtype: object
shape: (14383, 5)
100%
 720/720 [02:36<00:00,  5.02it/s]
Loss: 0.04177908971905708
Loss: 0.08230985613484489
Loss: 0.08399457804886486
Loss: 0.06163447560524267
Loss: 0.04787629511204947
Loss: 0.03949779063830233
Loss: 0.03397762095776484
Loss: 0.030040143460689266
O O
O O
O O
O O
O O
B-ORG B-ORG
B-ORG B-ORG
O O
O O
O O
Validation Loss: 0.02938824465528948
Validation Accuracy: 0.9919830972756728
precision recall f1-score support
LOC 0.88 0.91 0.90 4202
MISC 0.65 0.59 0.62 1899
ORG 0.74 0.76 0.75 3015
PER 0.92 0.93 0.92 5778
micro avg 0.84 0.84 0.84 14894
macro avg 0.80 0.80 0.80 14894
weighted avg 0.84 0.84 0.84 14894
________________________________________
```