|
--- |
|
tags: |
|
- text-generation |
|
- causal-lm |
|
- LoRA |
|
- QLoRA |
|
- transformer |
|
license: apache-2.0 |
|
datasets: |
|
- syubraj/medical-chat-phi-3.5-instruct-1k |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/Phi-3.5-mini-instruct |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter) |
|
|
|
This is a **LoRA adapter** for the `microsoft/Phi-3.5-mini-instruct` model, fine-tuned using **QLoRA** on medical instruction-following datasets. **This is NOT a standalone model**βyou must load it with the base model. |
|
|
|
## π₯ How to Use the LoRA Adapter |
|
|
|
To use this adapter, you need the base model **`microsoft/Phi-3.5-mini-instruct`**. Load it with `peft`: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
from peft import PeftModel |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
print(device) |
|
|
|
# Define base model and your fine-tuned LoRA checkpoint |
|
base_model_name = "microsoft/Phi-3.5-mini-instruct" |
|
lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA" |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
|
|
|
# Load model with proper 4-bit quantization settings |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_compute_dtype=torch.float16, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4" |
|
) |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"microsoft/Phi-3.5-mini-instruct", |
|
quantization_config=bnb_config, |
|
device_map="auto" |
|
) |
|
|
|
model = PeftModel.from_pretrained(base_model, lora_model_path) |
|
|
|
model = model.merge_and_unload() |
|
model.to(device) |
|
|
|
print("Model successfully loaded!") |
|
|
|
# Inference function |
|
def generate_response(user_query, system_message=None, max_length=1024): |
|
if system_message is None: |
|
system_message = ("You are a trusted AI-powered medical assistant. " |
|
"Analyze patient queries carefully and provide accurate, professional, and empathetic responses. " |
|
"Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.") |
|
|
|
# Prepare input prompt |
|
prompt = f"<|system|> {system_message} <|end|>\n<|user|> {user_query} <|end|>\n<|assistant|>" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(device) |
|
outputs = model.generate(**inputs, max_length=max_length) |
|
|
|
# Decode response |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
return response.split("<|assistant|>")[-1].strip().split("<|end|>")[0].strip() |
|
|
|
if __name__ == "__main__": |
|
res = generate_response("Hi, How can someone let go of fever?") |
|
print(res) |
|
|
|
``` |
|
|
|
--- |
|
|
|
## π‘ Training Details |
|
- **Base Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) |
|
- **Fine-Tuned On:** Medical conversations & instruction-based datasets |
|
- **Fine-Tuning Method:** **QLoRA** |
|
- **Precision:** 4-bit (`bitsandbytes`) |
|
|
|
--- |
|
|
|
## π License & Credits |
|
- This adapter follows the **Apache-2.0 License**. |
|
- **Credits:** [syubraj](https://huggingface.co/syubraj) for fine-tuning. |
|
|
|
--- |
|
|
|
## π Citation |
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@misc{syubraj2024phi3.5medical, |
|
title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)}, |
|
author={syubraj}, |
|
year={2024}, |
|
url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter} |
|
} |
|
``` |