---
tags:
- text-generation
- causal-lm
- LoRA
- QLoRA
- transformer
license: apache-2.0
datasets:
- syubraj/medical-chat-phi-3.5-instruct-1k
language:
- en
base_model:
- microsoft/Phi-3.5-mini-instruct
pipeline_tag: text-generation
---

# Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)

This is a **LoRA adapter** for the `microsoft/Phi-3.5-mini-instruct` model, fine-tuned using **QLoRA** on medical instruction-following datasets. **This is NOT a standalone model**—you must load it with the base model.

## 🔥 How to Use the LoRA Adapter

To use this adapter, you need the base model **`microsoft/Phi-3.5-mini-instruct`**. Load it with `peft`:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

# Define base model and your fine-tuned LoRA checkpoint
base_model_name = "microsoft/Phi-3.5-mini-instruct"  
lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA"  

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load model with proper 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,  
    bnb_4bit_quant_type="nf4"
)

base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct",
    quantization_config=bnb_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, lora_model_path)

model = model.merge_and_unload()
model.to(device)

print("Model successfully loaded!")

# Inference function
def generate_response(user_query, system_message=None, max_length=1024):
    if system_message is None:
        system_message = ("You are a trusted AI-powered medical assistant. "
                          "Analyze patient queries carefully and provide accurate, professional, and empathetic responses. "
                          "Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.")

    # Prepare input prompt
    prompt = f"<|system|> {system_message} <|end|>\n<|user|> {user_query} <|end|>\n<|assistant|>"

    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=max_length)

    # Decode response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<|assistant|>")[-1].strip().split("<|end|>")[0].strip()

if __name__ == "__main__":
    res = generate_response("Hi, How can someone let go of fever?")
    print(res)

```

---

## 💡 Training Details
- **Base Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
- **Fine-Tuned On:** Medical conversations & instruction-based datasets
- **Fine-Tuning Method:** **QLoRA**
- **Precision:** 4-bit (`bitsandbytes`)

---

## 📌 License & Credits
- This adapter follows the **Apache-2.0 License**.
- **Credits:** [syubraj](https://huggingface.co/syubraj) for fine-tuning.

---

## 🚀 Citation
If you use this model, please cite:

```bibtex
@misc{syubraj2024phi3.5medical,
  title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)},
  author={syubraj},
  year={2024},
  url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter}
}
```