|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
norbert3-small trained on wikiann (fo/is), sucx3 (se), dane (da) and norne (nb/nn) |
|
|
|
added a custom clf head along with a character-level cnn for adding a tiny extra signal for the classification. |
|
|
|
results: |
|
```css |
|
Eval on wikiann - fo |
|
index 0 |
|
tokens [Byrta, -, Aftur, og, aftur] |
|
ner_tags [3, 0, 0, 0, 0] |
|
subset fo |
|
dataset wikiann |
|
Name: 0, dtype: object |
|
shape: (100, 5) |
|
100% |
|
5/5 [00:01<00:00, 3.92it/s] |
|
Loss: 0.2276667356491089 |
|
O O |
|
B-ORG B-ORG |
|
B-ORG B-ORG |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
Validation Loss: 0.26530784368515015 |
|
Validation Accuracy: 0.9228951181745751 |
|
precision recall f1-score support |
|
|
|
LOC 0.86 0.81 0.83 154 |
|
ORG 0.67 0.73 0.70 125 |
|
PER 0.87 0.91 0.89 79 |
|
|
|
micro avg 0.79 0.80 0.80 358 |
|
macro avg 0.80 0.82 0.81 358 |
|
weighted avg 0.79 0.80 0.80 358 |
|
|
|
________________________________________ |
|
Eval on wikiann - is |
|
index 100 |
|
tokens [Beltaþyrill, ''Ceryle, alcyon, '', Sjaldséð] |
|
ner_tags [5, 0, 0, 0, 0] |
|
subset is |
|
dataset wikiann |
|
Name: 0, dtype: object |
|
shape: (1000, 5) |
|
100% |
|
50/50 [00:10<00:00, 5.02it/s] |
|
Loss: 0.22668001055717468 |
|
O O |
|
B-LOC B-LOC |
|
B-LOC B-LOC |
|
B-LOC B-LOC |
|
B-LOC B-LOC |
|
B-LOC B-LOC |
|
B-LOC B-LOC |
|
O O |
|
O O |
|
O O |
|
Validation Loss: 0.2526825902983546 |
|
Validation Accuracy: 0.9360383541181041 |
|
precision recall f1-score support |
|
|
|
LOC 0.84 0.85 0.84 1983 |
|
ORG 0.81 0.80 0.80 1762 |
|
PER 0.89 0.89 0.89 1020 |
|
|
|
micro avg 0.84 0.84 0.84 4765 |
|
macro avg 0.84 0.85 0.85 4765 |
|
weighted avg 0.84 0.84 0.84 4765 |
|
|
|
________________________________________ |
|
Eval on dane - default |
|
index 1100 |
|
tokens [To, kendte, russiske, historikere, Andronik, ... |
|
ner_tags [0, 0, 7, 0, 1, 2, 0, 1, 2, 0, 0, 0, 0, 5, 0, ... |
|
subset default |
|
dataset dane |
|
Name: 0, dtype: object |
|
shape: (565, 5) |
|
100% |
|
29/29 [00:06<00:00, 4.75it/s] |
|
Loss: 0.12037135660648346 |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
B-MISC B-MISC |
|
O O |
|
O O |
|
B-PER B-PER |
|
B-PER B-PER |
|
Validation Loss: 0.11113663488228259 |
|
Validation Accuracy: 0.972018408457994 |
|
precision recall f1-score support |
|
|
|
LOC 0.78 0.86 0.82 225 |
|
MISC 0.72 0.52 0.61 333 |
|
ORG 0.72 0.69 0.71 379 |
|
PER 0.96 0.92 0.94 298 |
|
|
|
micro avg 0.80 0.73 0.76 1235 |
|
macro avg 0.80 0.75 0.77 1235 |
|
weighted avg 0.79 0.73 0.76 1235 |
|
|
|
________________________________________ |
|
Eval on norne - bokmaal-7 |
|
index 1665 |
|
tokens [Honnørordene, er, ", dristig, formspråk, ", ,... |
|
ner_tags [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
|
subset bokmaal-7 |
|
dataset norne |
|
Name: 0, dtype: object |
|
shape: (1939, 5) |
|
100% |
|
97/97 [00:20<00:00, 4.56it/s] |
|
Loss: 0.0011819382198154926 |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
Validation Loss: 0.04194018930858649 |
|
Validation Accuracy: 0.9876322465792248 |
|
precision recall f1-score support |
|
|
|
LOC 0.85 0.90 0.87 498 |
|
MISC 0.81 0.74 0.78 363 |
|
ORG 0.77 0.83 0.80 499 |
|
PER 0.93 0.96 0.95 845 |
|
|
|
micro avg 0.86 0.88 0.87 2205 |
|
macro avg 0.84 0.86 0.85 2205 |
|
weighted avg 0.86 0.88 0.87 2205 |
|
|
|
________________________________________ |
|
Eval on norne - nynorsk-7 |
|
index 3604 |
|
tokens [Den, er, mettande, og, smakfull, ,, og, det, ... |
|
ner_tags [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... |
|
subset nynorsk-7 |
|
dataset norne |
|
Name: 0, dtype: object |
|
shape: (1511, 5) |
|
100% |
|
76/76 [00:15<00:00, 5.82it/s] |
|
Loss: 0.0790824368596077 |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
Validation Loss: 0.05325472676725583 |
|
Validation Accuracy: 0.9867293689853402 |
|
precision recall f1-score support |
|
|
|
LOC 0.77 0.91 0.84 365 |
|
MISC 0.80 0.76 0.78 295 |
|
ORG 0.83 0.82 0.82 397 |
|
PER 0.98 0.95 0.97 664 |
|
|
|
micro avg 0.87 0.88 0.87 1721 |
|
macro avg 0.85 0.86 0.85 1721 |
|
weighted avg 0.87 0.88 0.87 1721 |
|
|
|
________________________________________ |
|
Eval on sucx3_ner - original_cased |
|
index 5115 |
|
tokens [Just, i, dag, är, Saabs, företagsledning, där... |
|
ner_tags [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... |
|
subset original_cased |
|
dataset sucx3_ner |
|
Name: 0, dtype: object |
|
shape: (14383, 5) |
|
100% |
|
720/720 [02:36<00:00, 5.02it/s] |
|
Loss: 0.04177908971905708 |
|
Loss: 0.08230985613484489 |
|
Loss: 0.08399457804886486 |
|
Loss: 0.06163447560524267 |
|
Loss: 0.04787629511204947 |
|
Loss: 0.03949779063830233 |
|
Loss: 0.03397762095776484 |
|
Loss: 0.030040143460689266 |
|
O O |
|
O O |
|
O O |
|
O O |
|
O O |
|
B-ORG B-ORG |
|
B-ORG B-ORG |
|
O O |
|
O O |
|
O O |
|
Validation Loss: 0.02938824465528948 |
|
Validation Accuracy: 0.9919830972756728 |
|
precision recall f1-score support |
|
|
|
LOC 0.88 0.91 0.90 4202 |
|
MISC 0.65 0.59 0.62 1899 |
|
ORG 0.74 0.76 0.75 3015 |
|
PER 0.92 0.93 0.92 5778 |
|
|
|
micro avg 0.84 0.84 0.84 14894 |
|
macro avg 0.80 0.80 0.80 14894 |
|
weighted avg 0.84 0.84 0.84 14894 |
|
|
|
________________________________________ |
|
``` |
|
|
|
|