DunnBC22 commited on
Commit
fc20c64
·
1 Parent(s): 385a0ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -23
README.md CHANGED
@@ -2,42 +2,71 @@
2
  license: apache-2.0
3
  tags:
4
  - generated_from_trainer
 
5
  datasets:
6
  - bionlp2004
7
  model-index:
8
  - name: bert-base-cased-finetuned-ner-bio_nlp_2004
9
  results: []
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
  # bert-base-cased-finetuned-ner-bio_nlp_2004
16
 
17
- This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the bionlp2004 dataset.
 
18
  It achieves the following results on the evaluation set:
19
  - Loss: 0.2066
20
- - Dna: {'precision': 0.6619127516778524, 'recall': 0.7471590909090909, 'f1': 0.7019572953736656, 'number': 1056}
21
- - Rna: {'precision': 0.589041095890411, 'recall': 0.7288135593220338, 'f1': 0.6515151515151515, 'number': 118}
22
- - Cell Line: {'precision': 0.4758522727272727, 'recall': 0.67, 'f1': 0.5564784053156145, 'number': 500}
23
- - Cell Type: {'precision': 0.7294117647058823, 'recall': 0.7100468505986466, 'f1': 0.7195990503824848, 'number': 1921}
24
- - Protein: {'precision': 0.6657656225155033, 'recall': 0.8263272153147819, 'f1': 0.7374075378654457, 'number': 5067}
25
- - Overall Precision: 0.6628
26
- - Overall Recall: 0.7805
27
- - Overall F1: 0.7169
28
- - Overall Accuracy: 0.9367
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Model description
31
 
32
- More information needed
33
 
34
  ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
  ## Training and evaluation data
39
 
40
- More information needed
41
 
42
  ## Training procedure
43
 
@@ -53,17 +82,17 @@ The following hyperparameters were used during training:
53
  - num_epochs: 3
54
 
55
  ### Training results
 
 
 
 
 
56
 
57
- | Training Loss | Epoch | Step | Validation Loss | Dna | Rna | Cell Line | Cell Type | Protein | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy |
58
- |:-------------:|:-----:|:----:|:---------------:|:---------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------:|:-----------------:|:--------------:|:----------:|:----------------:|
59
- | 0.1701 | 1.0 | 1039 | 0.1927 | {'precision': 0.6152610441767068, 'recall': 0.7253787878787878, 'f1': 0.6657974793568013, 'number': 1056} | {'precision': 0.6616541353383458, 'recall': 0.7457627118644068, 'f1': 0.7011952191235059, 'number': 118} | {'precision': 0.46697388632872505, 'recall': 0.608, 'f1': 0.5282363162467419, 'number': 500} | {'precision': 0.6997455470737913, 'recall': 0.7157730348776679, 'f1': 0.7076685537828101, 'number': 1921} | {'precision': 0.6602894693062719, 'recall': 0.783303730017762, 'f1': 0.716555334897996, 'number': 5067} | 0.6499 | 0.7506 | 0.6966 | 0.9352 |
60
- | 0.145 | 2.0 | 2078 | 0.1981 | {'precision': 0.6364372469635627, 'recall': 0.7443181818181818, 'f1': 0.6861632474901789, 'number': 1056} | {'precision': 0.6408450704225352, 'recall': 0.7711864406779662, 'f1': 0.7000000000000002, 'number': 118} | {'precision': 0.4606896551724138, 'recall': 0.668, 'f1': 0.5453061224489797, 'number': 500} | {'precision': 0.7375615090213231, 'recall': 0.7022384174908901, 'f1': 0.7194666666666666, 'number': 1921} | {'precision': 0.6758880340481257, 'recall': 0.8148805999605289, 'f1': 0.7389047959914101, 'number': 5067} | 0.6662 | 0.7722 | 0.7153 | 0.9364 |
61
- | 0.1116 | 3.0 | 3117 | 0.2066 | {'precision': 0.6619127516778524, 'recall': 0.7471590909090909, 'f1': 0.7019572953736656, 'number': 1056} | {'precision': 0.589041095890411, 'recall': 0.7288135593220338, 'f1': 0.6515151515151515, 'number': 118} | {'precision': 0.4758522727272727, 'recall': 0.67, 'f1': 0.5564784053156145, 'number': 500} | {'precision': 0.7294117647058823, 'recall': 0.7100468505986466, 'f1': 0.7195990503824848, 'number': 1921} | {'precision': 0.6657656225155033, 'recall': 0.8263272153147819, 'f1': 0.7374075378654457, 'number': 5067} | 0.6628 | 0.7805 | 0.7169 | 0.9367 |
62
-
63
 
64
  ### Framework versions
65
 
66
  - Transformers 4.28.1
67
  - Pytorch 2.0.0
68
  - Datasets 2.11.0
69
- - Tokenizers 0.13.3
 
2
  license: apache-2.0
3
  tags:
4
  - generated_from_trainer
5
+ - biology
6
  datasets:
7
  - bionlp2004
8
  model-index:
9
  - name: bert-base-cased-finetuned-ner-bio_nlp_2004
10
  results: []
11
+ language:
12
+ - en
13
+ metrics:
14
+ - seqeval
15
+ - f1
16
+ - recall
17
+ - precision
18
+ pipeline_tag: token-classification
19
  ---
20
 
 
 
 
21
  # bert-base-cased-finetuned-ner-bio_nlp_2004
22
 
23
+ This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased).
24
+
25
  It achieves the following results on the evaluation set:
26
  - Loss: 0.2066
27
+ - Dna:
28
+ - Precision: 0.6619127516778524
29
+ - Recall: 0.7471590909090909
30
+ - F1: 0.7019572953736656
31
+ - Number: 1056
32
+ - Rna:
33
+ - Precision: 0.589041095890411
34
+ - Recall: 0.7288135593220338
35
+ - F1: 0.6515151515151515
36
+ - Number': 118
37
+ - Cell Line:
38
+ - Precision: 0.4758522727272727
39
+ - Recall: 0.67
40
+ - F1: 0.5564784053156145
41
+ - Number: 500
42
+ - Cell Type:
43
+ - Precision: 0.7294117647058823
44
+ - Recall: 0.7100468505986466
45
+ - F1: 0.7195990503824848
46
+ - Number: 1921
47
+ - Protein:
48
+ - Precision: 0.6657656225155033
49
+ - Recall: 0.8263272153147819
50
+ - F1: 0.7374075378654457
51
+ - Number': 5067
52
+
53
+ - Overall
54
+ - Precision: 0.6628
55
+ - Recall: 0.7805
56
+ - F1: 0.7169
57
+ - Accuracy: 0.9367
58
 
59
  ## Model description
60
 
61
+ For more information on how it was created, check out the following link: https://github.com/DunnBC22/NLP_Projects/blob/main/Token%20Classification/Monolingual/tner-bionlp2004/NER%20Project%20Using%20tner-bionlp%202004%20Dataset%20(BERT-Base).ipynb
62
 
63
  ## Intended uses & limitations
64
 
65
+ This model is intended to demonstrate my ability to solve a complex problem using technology.
66
 
67
  ## Training and evaluation data
68
 
69
+ Dataset Source: https://huggingface.co/datasets/tner/bionlp2004
70
 
71
  ## Training procedure
72
 
 
82
  - num_epochs: 3
83
 
84
  ### Training results
85
+ | Training Loss | Epoch | Step | Valid. Loss | Dna Precision | Dna Recall | Dna F1 | Dna Number | Rna Precision | Rna Recall | Rna F1 | Rna Number | Cell Line Precision | Cell Type Recall | Cell Type F1 | Cell Type Number | Cell Type | Protein Precision | Protein Recall | Protein F1 | Protein Number | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy |
86
+ |:---------:|:-----:|:----:|:---------:|:-------:|:------:|:------:|:------:|:-------------:|:----------:|:------:|:----------:|:----------:|:---------:|:-------:|:-------:|:---------:|:-------:|:--------:|:------:|:-----------:|:--------:|:--------:|:----------:|:---------:|
87
+ | 0.1701 | 1.0 | 1039 | 0.1927 | 0.6153 | 0.7254 | 0.6658 | 1056 | 0.6617 | 0.7458 | 0.7012 | 118 | 0.4670 | 0.608 | 0.5282 | 500 | 0.6997 | 0.7158 | 0.7077 | 1921 | 0.6603 | 0.7833 | 0.7166 | 5067 | 0.6499 | 0.7506 | 0.6966 | 0.9352 |
88
+ | 0.145 | 2.0 | 2078 | 0.1981 | 0.6364 | 0.7443 | 0.6862 | 1056 | 0.6408 | 0.7712 | 0.7000 | 118 | 0.4607 | 0.668 | 0.5453 | 500 | 0.7376 | 0.7022 | 0.7195 | 1921 | 0.6759 | 0.8149 | 0.7389 | 5067 | 0.6662 | 0.7722 | 0.7153 | 0.9364 |
89
+ | 0.1116 | 3.0 | 3117 | 0.2066 | 0.6619 | 0.7472 | 0.7020 | 1056 | 0.5890 | 0.7288 | 0.6515 | 118 | 0.4759 | 0.67 | 0.5565 | 500 | 0.7294 | 0.7100 | 0.7196 | 1921 | 0.6658 | 0.8263 | 0.7374 | 5067 | 0.6628 | 0.7805 | 0.7169 | 0.9367 |
90
 
91
+ * Metrics shown above are rounded to the neareset ten-thousandth
 
 
 
 
 
92
 
93
  ### Framework versions
94
 
95
  - Transformers 4.28.1
96
  - Pytorch 2.0.0
97
  - Datasets 2.11.0
98
+ - Tokenizers 0.13.3