---
library_name: transformers
tags:
- trl
- sft
datasets:
- dmedhi/wiki_medical_terms
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---

# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
**Fine-Tuned Llama 3.1 3B Instruct with Medical Terms using QLoRA**

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

This repository contains a fine-tuned version of **Meta’s Llama 3.1 3B Instruct** model, optimized for medical term comprehension using **QLoRA** (Quantized Low-Rank Adaptation) techniques. The model has been fine-tuned on the **dmedhi/wiki_medical_terms** dataset, enhancing its ability to generate accurate responses related to medical terminology and healthcare-related questions.

The fine-tuning process involves using **QLoRA** to adapt the pre-trained model while maintaining memory efficiency and computational feasibility. This technique allows for fine-tuning large-scale models on consumer-grade GPUs by leveraging **NF4** 4-bit quantization.

- **Developed by [FineTuned]:** Karthik Manjunath Hadagali
- **Model type:** Text-Generation
- **Language(s) (NLP):** Python
- **License:** [More Information Needed]
- **Fine-Tuned from model [optional]:** Meta Llama 3.1 3B Instruct
- **Fine-Tuning Method:** QLoRA
- **Target Task:** Medical Knowledge Augmentation for Causal Language Modeling (CAUSAL_LM)
- **Quantization:** 4-bit NF4 (Normal Float 4) Quantization
- **Hardware Used:** Consumer-grade GPU with 4-bit memory optimization

## How to Get Started with the Model
Use the code below to get started with the model.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the fine-tuned model
model_id = "Karthik2510/Medi_terms_Llama3_1_8B_instruct_model"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

# Example query
input_text = "What is the medical definition of pneumonia?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The model has been fine-tuned on the **dmedhi/wiki_medical_terms** dataset. This dataset is designed to improve medical terminology comprehension and consists of:

✅ Medical definitions and terminologies

✅ Disease symptoms and conditions

✅ Healthcare and clinical knowledge from Wikipedia's medical section

This dataset ensures that the fine-tuned model performs well in understanding and responding to medical queries with enhanced accuracy.

### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing
- The dataset was cleaned and tokenized using the Llama 3.1 tokenizer, ensuring that medical terms were preserved.

- Special medical terminologies were handled properly to maintain context.

- The dataset was formatted into a question-answer style to align with the instruction-based nature of Llama 3.1 3B Instruct.


#### Training Hyperparameters
- **Training regime:** bf16 mixed precision (to balance efficiency and precision)
- **Batch Size:** 1 per device
- **Gradient Accumulation Steps:** 4 (to simulate a larger batch size)
- **Learning Rate:** 2e-4
- **Warmup Steps:** 100
- **Epochs:** 3
- **Optimizer:** paged_adamw_8bit (efficient low-memory optimizer)
- **LoRA Rank (r):** 16
- **LoRA Alpha:** 32
- **LoRA Dropout:** 0.05

#### Speeds, Sizes, Times
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- **Training Hardware:** Single GPU (consumer-grade, VRAM-optimized)
- **Model Size after Fine-Tuning:** Approx. 3B parameters with LoRA adapters
- **Training Time:** ~3-4 hours per epoch on A100 40GB GPU
- **Final Checkpoint Size:** ~2.8GB (with LoRA adapters stored separately)


## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** A100 40 GB GPU
- **Hours used:** Approximatly 3 to 4 hours
- **Cloud Provider:** Google Colabs
- **Compute Region:** US-East
- **Carbon Emitted:** [More Information Needed]

## Limitations & Considerations
❗ Not a substitute for professional medical advice

❗ May contain biases from training data

❗ Limited knowledge scope (not updated in real-time)

## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use this model, please consider citing:
```bibtex
@article{llama3.1_medical_qlora,
  title={Fine-tuned Llama 3.1 3B Instruct for Medical Knowledge with QLoRA},
  author={Karthik Manjunath Hadagali},
  year={2024},
  journal={Hugging Face Model Repository}
}
```

## Acknowledgments
- Meta AI for the Llama 3.1 3B Instruct Model.
- Hugging Face PEFT for QLoRA implementation.
- dmedhi/wiki_medical_terms dataset contributors.