syubraj's picture
Update README.md
e275f6c verified
---
tags:
- text-generation
- causal-lm
- LoRA
- QLoRA
- transformer
license: apache-2.0
datasets:
- syubraj/medical-chat-phi-3.5-instruct-1k
language:
- en
base_model:
- microsoft/Phi-3.5-mini-instruct
pipeline_tag: text-generation
---
# Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)
This is a **LoRA adapter** for the `microsoft/Phi-3.5-mini-instruct` model, fine-tuned using **QLoRA** on medical instruction-following datasets. **This is NOT a standalone model**β€”you must load it with the base model.
## πŸ”₯ How to Use the LoRA Adapter
To use this adapter, you need the base model **`microsoft/Phi-3.5-mini-instruct`**. Load it with `peft`:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
# Define base model and your fine-tuned LoRA checkpoint
base_model_name = "microsoft/Phi-3.5-mini-instruct"
lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load model with proper 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3.5-mini-instruct",
quantization_config=bnb_config,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, lora_model_path)
model = model.merge_and_unload()
model.to(device)
print("Model successfully loaded!")
# Inference function
def generate_response(user_query, system_message=None, max_length=1024):
if system_message is None:
system_message = ("You are a trusted AI-powered medical assistant. "
"Analyze patient queries carefully and provide accurate, professional, and empathetic responses. "
"Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.")
# Prepare input prompt
prompt = f"<|system|> {system_message} <|end|>\n<|user|> {user_query} <|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=max_length)
# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("<|assistant|>")[-1].strip().split("<|end|>")[0].strip()
if __name__ == "__main__":
res = generate_response("Hi, How can someone let go of fever?")
print(res)
```
---
## πŸ’‘ Training Details
- **Base Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
- **Fine-Tuned On:** Medical conversations & instruction-based datasets
- **Fine-Tuning Method:** **QLoRA**
- **Precision:** 4-bit (`bitsandbytes`)
---
## πŸ“Œ License & Credits
- This adapter follows the **Apache-2.0 License**.
- **Credits:** [syubraj](https://huggingface.co/syubraj) for fine-tuning.
---
## πŸš€ Citation
If you use this model, please cite:
```bibtex
@misc{syubraj2024phi3.5medical,
title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)},
author={syubraj},
year={2024},
url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter}
}
```