Update README.md

2c8a248 verified 6 months ago

4.04 kB

	---
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	widget:
	- text: >-
	SAMPLE 32,441 archived appendix samples fixed in formalin and embedded in
	paraffin and tested for the presence of abnormal prion protein (PrP).
	base_model: dmis-lab/biobert-base-cased-v1.1
	model-index:
	- name: BioBert-PubMed200kRCT
	results: []
	license: cc-by-nc-3.0
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# BioBert-PubMed200kRCT

	This model is a fine-tuned version of [dmis-lab/biobert-base-cased-v1.1](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) on the [PubMed200kRCT](https://github.com/Franck-Dernoncourt/pubmed-rct/tree/master/PubMed_200k_RCT) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2832
	- Accuracy: 0.8934

	## Model description

	More information needed

	## Intended uses & limitations

	The model can be used for text classification tasks of Randomized Controlled Trials that does not have any structure. The text can be classified as one of the following:
	* BACKGROUND
	* CONCLUSIONS
	* METHODS
	* OBJECTIVE
	* RESULTS

	The model can be directly used like this:

	```python
	from transformers import TextClassificationPipeline
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	model = AutoModelForSequenceClassification.from_pretrained("pritamdeka/BioBert-PubMed200kRCT")
	tokenizer = AutoTokenizer.from_pretrained("pritamdeka/BioBert-PubMed200kRCT")
	pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
	pipe("Treatment of 12 healthy female subjects with CDCA for 2 days resulted in increased BAT activity.")
	```
	Results will be shown as follows:

	```python
	[[{'label': 'BACKGROUND', 'score': 0.0027583304326981306},
	{'label': 'CONCLUSIONS', 'score': 0.044541116803884506},
	{'label': 'METHODS', 'score': 0.19493348896503448},
	{'label': 'OBJECTIVE', 'score': 0.003996663726866245},
	{'label': 'RESULTS', 'score': 0.7537703514099121}]]
	```

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|
	\| 0.3587 \| 0.14 \| 5000 \| 0.3137 \| 0.8834 \|
	\| 0.3318 \| 0.29 \| 10000 \| 0.3100 \| 0.8831 \|
	\| 0.3286 \| 0.43 \| 15000 \| 0.3033 \| 0.8864 \|
	\| 0.3236 \| 0.58 \| 20000 \| 0.3037 \| 0.8862 \|
	\| 0.3182 \| 0.72 \| 25000 \| 0.2939 \| 0.8876 \|
	\| 0.3129 \| 0.87 \| 30000 \| 0.2910 \| 0.8885 \|
	\| 0.3078 \| 1.01 \| 35000 \| 0.2914 \| 0.8887 \|
	\| 0.2791 \| 1.16 \| 40000 \| 0.2975 \| 0.8874 \|
	\| 0.2723 \| 1.3 \| 45000 \| 0.2913 \| 0.8906 \|
	\| 0.2724 \| 1.45 \| 50000 \| 0.2879 \| 0.8904 \|
	\| 0.27 \| 1.59 \| 55000 \| 0.2874 \| 0.8911 \|
	\| 0.2681 \| 1.74 \| 60000 \| 0.2848 \| 0.8928 \|
	\| 0.2672 \| 1.88 \| 65000 \| 0.2832 \| 0.8934 \|


	### Framework versions

	- Transformers 4.18.0.dev0
	- Pytorch 1.10.0+cu111
	- Datasets 1.18.4
	- Tokenizers 0.11.6


	## Citing & Authors

	<!--- Describe where people can find more information -->

	If you use the model kindly cite the following work

	```
	@inproceedings{deka2022evidence,
	title={Evidence Extraction to Validate Medical Claims in Fake News Detection},
	author={Deka, Pritam and Jurek-Loughrey, Anna and others},
	booktitle={International Conference on Health Information Science},
	pages={3--15},
	year={2022},
	organization={Springer}
	}
	```