havocy28
/

VetBERT

Inference Endpoints

Model card Files Files and versions Community

VetBERT / README.md

havocy28's picture

Update README.md

db5e1d0 verified 6 months ago

|

history blame contribute delete

3.26 kB

	---
	license: openrail
	language:
	- en
	pipeline_tag: fill-mask
	tags:
	- biology
	- medical
	widget:
	- text: "poc all well. wound healed. No [MASK] on exam. Microchip working. Sign off, resee if worried."
	example_title: "Post operative Checkup"
	- text: "other 2 degu's unwell recently want health check for this one appears well for age blood [MASK] 3.8. offer to reweigh and monitor weight"
	example_title: "Blood Glucose check"
	---


	# VetBERT Pretrained model for Veterinary Clinical Tasks

	This is the pretrained VetBERT model from the github repo: [https://github.com/havocy28/VetBERT](https://github.com/havocy28/VetBERT)

	<!-- Provide a quick summary of what the model is/does. -->
	This pretrained model is designed for performing NLP tasks related to veterinary clinical notes. The [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp-1.17) (Hur et al., BioNLP 2020) paper introduced VetBERT model: an initialized Bert Model with ClinicalBERT (Bio+Clinical BERT) and further pretrained on the [VetCompass Australia](https://www.vetcompass.com.au/) corpus for performing tasks specific to veterinary medicine. This paper discusses [VetBERTDx](https://huggingface.co/havocy28/VetBERTDx), the finetuned version of VetBERT trained for the the disease classification task.

	## Pretraining Data

	The VetBERT model was initialized from [Bio_ClinicalBERT model](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT), which was initialized from BERT. The VetBERT model was trained on over 15 million veterinary clincal Records and 1.3 Billion tokens.

	## Pretraining Hyperparameters

	During the pretraining phase for VetBERT, we used a batch size of 32, a maximum sequence length of 512, and a learning rate of 5 · 10−5. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20).

	## VetBERT Finetuning

	VetBERT was further finetuned on a set of 5002 annotated clinical notes to classifiy the disease syndrome associated with the clinical notes as outlined in the paper: [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp-1.17)

	## How to use the model

	Load the model via the transformers library:

	```
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
	model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")

	VetBERT_masked = pipeline("fill-mask", model=model, tokenizer=tokenizer)
	VetBERT('Suspected pneuomina, will require an [MASK] but in the meantime will prescribed antibiotics')

	```

	## Citation

	Please cite this article: Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp-1.17). In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156–166, Online. Association for Computational Linguistics.