Update README.md

ac9947b verified about 2 months ago

5.73 kB

	---
	library_name: transformers
	tags:
	- trl
	- sft
	datasets:
	- dmedhi/wiki_medical_terms
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	---

	# Model Card for Model ID
	<!-- Provide a quick summary of what the model is/does. -->
	## Model Details
	### Model Description
	<!-- Provide a longer summary of what this model is. -->
	Fine-Tuned Llama 3.1 3B Instruct with Medical Terms using QLoRA

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	This repository contains a fine-tuned version of Meta’s Llama 3.1 3B Instruct model, optimized for medical term comprehension using QLoRA (Quantized Low-Rank Adaptation) techniques. The model has been fine-tuned on the dmedhi/wiki_medical_terms dataset, enhancing its ability to generate accurate responses related to medical terminology and healthcare-related questions.

	The fine-tuning process involves using QLoRA to adapt the pre-trained model while maintaining memory efficiency and computational feasibility. This technique allows for fine-tuning large-scale models on consumer-grade GPUs by leveraging NF4 4-bit quantization.

	- Developed by [FineTuned]: Karthik Manjunath Hadagali
	- Model type: Text-Generation
	- Language(s) (NLP): Python
	- License: [More Information Needed]
	- Fine-Tuned from model [optional]: Meta Llama 3.1 3B Instruct
	- Fine-Tuning Method: QLoRA
	- Target Task: Medical Knowledge Augmentation for Causal Language Modeling (CAUSAL_LM)
	- Quantization: 4-bit NF4 (Normal Float 4) Quantization
	- Hardware Used: Consumer-grade GPU with 4-bit memory optimization

	## How to Get Started with the Model
	Use the code below to get started with the model.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load the fine-tuned model
	model_id = "Karthik2510/Medi_terms_Llama3_1_8B_instruct_model"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

	# Example query
	input_text = "What is the medical definition of pneumonia?"
	inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Details
	### Training Data
	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	The model has been fine-tuned on the dmedhi/wiki_medical_terms dataset. This dataset is designed to improve medical terminology comprehension and consists of:

	✅ Medical definitions and terminologies

	✅ Disease symptoms and conditions

	✅ Healthcare and clinical knowledge from Wikipedia's medical section

	This dataset ensures that the fine-tuned model performs well in understanding and responding to medical queries with enhanced accuracy.

	### Training Procedure
	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
	#### Preprocessing
	- The dataset was cleaned and tokenized using the Llama 3.1 tokenizer, ensuring that medical terms were preserved.

	- Special medical terminologies were handled properly to maintain context.

	- The dataset was formatted into a question-answer style to align with the instruction-based nature of Llama 3.1 3B Instruct.


	#### Training Hyperparameters
	- Training regime: bf16 mixed precision (to balance efficiency and precision)
	- Batch Size: 1 per device
	- Gradient Accumulation Steps: 4 (to simulate a larger batch size)
	- Learning Rate: 2e-4
	- Warmup Steps: 100
	- Epochs: 3
	- Optimizer: paged_adamw_8bit (efficient low-memory optimizer)
	- LoRA Rank (r): 16
	- LoRA Alpha: 32
	- LoRA Dropout: 0.05

	#### Speeds, Sizes, Times
	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
	- Training Hardware: Single GPU (consumer-grade, VRAM-optimized)
	- Model Size after Fine-Tuning: Approx. 3B parameters with LoRA adapters
	- Training Time: ~3-4 hours per epoch on A100 40GB GPU
	- Final Checkpoint Size: ~2.8GB (with LoRA adapters stored separately)


	## Environmental Impact
	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: A100 40 GB GPU
	- Hours used: Approximatly 3 to 4 hours
	- Cloud Provider: Google Colabs
	- Compute Region: US-East
	- Carbon Emitted: [More Information Needed]

	## Limitations & Considerations
	❗ Not a substitute for professional medical advice

	❗ May contain biases from training data

	❗ Limited knowledge scope (not updated in real-time)

	## Citation
	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
	If you use this model, please consider citing:
	```bibtex
	@article{llama3.1_medical_qlora,
	title={Fine-tuned Llama 3.1 3B Instruct for Medical Knowledge with QLoRA},
	author={Karthik Manjunath Hadagali},
	year={2024},
	journal={Hugging Face Model Repository}
	}
	```

	## Acknowledgments
	- Meta AI for the Llama 3.1 3B Instruct Model.
	- Hugging Face PEFT for QLoRA implementation.
	- dmedhi/wiki_medical_terms dataset contributors.