|
--- |
|
language: "en" |
|
widget: |
|
- text: "On the other hand, a decline of the arsenic content in hair and nail was observed after withdrawal of the drug." |
|
- text: "These differences in gene expression have not been molecularly defined." |
|
- text: "p65 was detected in the cytoplasm of FDC , whereas nuclei were negative." |
|
- text: "These differences in gene expression have not been molecularly defined." |
|
|
|
datasets: |
|
- Genia |
|
--- |
|
|
|
## A Biomedical Pos-Tagger for English |
|
Trained with the GENIA corpus. |
|
|
|
Eval: |
|
``` |
|
precision recall f1-score support |
|
|
|
0 0.98 1.00 0.99 263 |
|
3 0.93 1.00 0.97 14 |
|
5 1.00 1.00 1.00 8 |
|
6 0.99 0.99 0.99 169 |
|
7 1.00 1.00 1.00 203 |
|
8 0.99 1.00 1.00 195 |
|
9 0.95 0.78 0.85 98 |
|
10 0.83 1.00 0.91 5 |
|
11 0.96 0.97 0.96 532 |
|
12 1.00 1.00 1.00 252 |
|
13 0.99 0.98 0.99 1575 |
|
14 0.95 0.95 0.95 133 |
|
15 0.89 0.89 0.89 9 |
|
16 1.00 1.00 1.00 3 |
|
18 0.99 1.00 0.99 69 |
|
19 1.00 0.95 0.98 22 |
|
20 0.99 1.00 1.00 395 |
|
22 1.00 1.00 1.00 1328 |
|
23 1.00 1.00 1.00 987 |
|
24 1.00 1.00 1.00 6 |
|
25 0.00 0.00 0.00 0 |
|
26 1.00 1.00 1.00 620 |
|
27 0.00 0.00 0.00 1 |
|
28 1.00 1.00 1.00 39 |
|
29 0.98 0.99 0.98 5674 |
|
30 0.97 0.96 0.96 2075 |
|
31 1.00 0.71 0.83 7 |
|
32 1.00 0.80 0.89 5 |
|
33 1.00 1.00 1.00 58 |
|
34 1.00 1.00 1.00 2 |
|
35 0.96 0.96 0.96 336 |
|
37 0.99 1.00 1.00 1579 |
|
38 1.00 1.00 1.00 1446 |
|
39 1.00 0.98 0.99 57 |
|
|
|
accuracy 0.99 18165 |
|
macro avg 0.92 0.91 0.91 18165 |
|
weighted avg 0.99 0.99 0.99 18165 |
|
|
|
F1: 0.985267446136761 Accuracy: 0.9853564547206166 |
|
``` |
|
|
|
Tags: |
|
``` |
|
{0: 'VBD', |
|
1: 'N', |
|
2: 'XT', |
|
3: 'JJS', |
|
4: 'E2A', |
|
5: 'WRB', |
|
6: 'VB', |
|
7: 'TO', |
|
8: 'VBP', |
|
9: 'FW', |
|
10: 'EX', |
|
11: 'VBN', |
|
12: 'VBZ', |
|
13: 'NNS', |
|
14: 'VBG', |
|
15: 'RBR', |
|
16: 'WP', |
|
17: 'CT', |
|
18: 'PRP', |
|
19: 'JJR', |
|
20: 'CC', |
|
21: 'NNPS', |
|
22: 'CD', |
|
23: 'DT', |
|
24: 'NNP', |
|
25: 'PDT', |
|
26: 'LS', |
|
27: 'PP', |
|
28: 'PRP$', |
|
29: 'NN', |
|
30: 'JJ', |
|
31: 'RP', |
|
32: 'RBS', |
|
33: 'MD', |
|
34: 'WP$', |
|
35: 'RB', |
|
36: 'SYM', |
|
37: 'IN', |
|
38: 'PUNCT', |
|
39: 'WDT', |
|
40: 'POS', |
|
41: '<pad>'} |
|
``` |
|
|
|
Parameters: |
|
``` |
|
nepochs = 30 (stop at 18th) |
|
batch_size = 32 |
|
batch_status = 32 |
|
learning_rate = 1e-5 |
|
early_stop = 3 |
|
max_length = 200 |
|
checkpoint: dmis-lab/biobert-base-cased-v1.2 |
|
``` |
|
|
|
See more in: https://github.com/lisaterumi/postagger-bio-english |