metadata

license: apache-2.0
tags:
  - generated_from_trainer
  - biology
datasets:
  - bionlp2004
model-index:
  - name: bert-base-cased-finetuned-ner-bio_nlp_2004
    results: []
language:
  - en
metrics:
  - seqeval
  - f1
  - recall
  - precision
pipeline_tag: token-classification

bert-base-cased-finetuned-ner-bio_nlp_2004

This model is a fine-tuned version of bert-base-cased.

It achieves the following results on the evaluation set:

Loss: 0.2066
Dna:
- Precision: 0.6619127516778524
- Recall: 0.7471590909090909
- F1: 0.7019572953736656
- Number: 1056
Rna:
- Precision: 0.589041095890411
- Recall: 0.7288135593220338
- F1: 0.6515151515151515
- Number': 118
Cell Line:
- Precision: 0.4758522727272727
- Recall: 0.67
- F1: 0.5564784053156145
- Number: 500
Cell Type:
- Precision: 0.7294117647058823
- Recall: 0.7100468505986466
- F1: 0.7195990503824848
- Number: 1921
Protein:
- Precision: 0.6657656225155033
- Recall: 0.8263272153147819
- F1: 0.7374075378654457
- Number': 5067
Overall
- Precision: 0.6628
- Recall: 0.7805
- F1: 0.7169
- Accuracy: 0.9367

Model description

For more information on how it was created, check out the following link: https://github.com/DunnBC22/NLP_Projects/blob/main/Token%20Classification/Monolingual/tner-bionlp2004/NER%20Project%20Using%20tner-bionlp%202004%20Dataset%20(BERT-Base).ipynb

Intended uses & limitations

This model is intended to demonstrate my ability to solve a complex problem using technology.

Training and evaluation data

Dataset Source: https://huggingface.co/datasets/tner/bionlp2004

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Valid. Loss	Dna Precision	Dna Recall	Dna F1	Dna Number	Rna Precision	Rna Recall	Rna F1	Rna Number	Cell Line Precision	Cell Type Recall	Cell Type F1	Cell Type Number	Cell Type	Protein Precision	Protein Recall	Protein F1	Protein Number	Overall Precision	Overall Recall	Overall F1	Overall Accuracy
0.1701	1.0	1039	0.1927	0.6153	0.7254	0.6658	1056	0.6617	0.7458	0.7012	118	0.4670	0.608	0.5282	500	0.6997	0.7158	0.7077	1921	0.6603	0.7833	0.7166	5067	0.6499
0.145	2.0	2078	0.1981	0.6364	0.7443	0.6862	1056	0.6408	0.7712	0.7000	118	0.4607	0.668	0.5453	500	0.7376	0.7022	0.7195	1921	0.6759	0.8149	0.7389	5067	0.6662
0.1116	3.0	3117	0.2066	0.6619	0.7472	0.7020	1056	0.5890	0.7288	0.6515	118	0.4759	0.67	0.5565	500	0.7294	0.7100	0.7196	1921	0.6658	0.8263	0.7374	5067	0.6628

Metrics shown above are rounded to the neareset ten-thousandth

Framework versions

Transformers 4.28.1
Pytorch 2.0.0
Datasets 2.11.0
Tokenizers 0.13.3