--- tags: - text-generation - causal-lm - LoRA - QLoRA - transformer license: apache-2.0 datasets: - syubraj/medical-chat-phi-3.5-instruct-1k language: - en base_model: - microsoft/Phi-3.5-mini-instruct pipeline_tag: text-generation --- # Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter) This is a **LoRA adapter** for the `microsoft/Phi-3.5-mini-instruct` model, fine-tuned using **QLoRA** on medical instruction-following datasets. **This is NOT a standalone model**—you must load it with the base model. ## 🔥 How to Use the LoRA Adapter To use this adapter, you need the base model **`microsoft/Phi-3.5-mini-instruct`**. Load it with `peft`: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel device = "cuda" if torch.cuda.is_available() else "cpu" print(device) # Define base model and your fine-tuned LoRA checkpoint base_model_name = "microsoft/Phi-3.5-mini-instruct" lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Load model with proper 4-bit quantization settings bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" ) base_model = AutoModelForCausalLM.from_pretrained( "microsoft/Phi-3.5-mini-instruct", quantization_config=bnb_config, device_map="auto" ) model = PeftModel.from_pretrained(base_model, lora_model_path) model = model.merge_and_unload() model.to(device) print("Model successfully loaded!") # Inference function def generate_response(user_query, system_message=None, max_length=1024): if system_message is None: system_message = ("You are a trusted AI-powered medical assistant. " "Analyze patient queries carefully and provide accurate, professional, and empathetic responses. " "Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.") # Prepare input prompt prompt = f"<|system|> {system_message} <|end|>\n<|user|> {user_query} <|end|>\n<|assistant|>" inputs = tokenizer(prompt, return_tensors="pt").to(device) outputs = model.generate(**inputs, max_length=max_length) # Decode response response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response.split("<|assistant|>")[-1].strip().split("<|end|>")[0].strip() if __name__ == "__main__": res = generate_response("Hi, How can someone let go of fever?") print(res) ``` --- ## 💡 Training Details - **Base Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) - **Fine-Tuned On:** Medical conversations & instruction-based datasets - **Fine-Tuning Method:** **QLoRA** - **Precision:** 4-bit (`bitsandbytes`) --- ## 📌 License & Credits - This adapter follows the **Apache-2.0 License**. - **Credits:** [syubraj](https://huggingface.co/syubraj) for fine-tuning. --- ## 🚀 Citation If you use this model, please cite: ```bibtex @misc{syubraj2024phi3.5medical, title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)}, author={syubraj}, year={2024}, url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter} } ```