syubraj
/

Phi-3.5-mini-instruct-MedicalChat-QLoRA

Text Generation

Model card Files Files and versions Community

Phi-3.5-mini-instruct-MedicalChat-QLoRA / README.md

syubraj's picture

Update README.md

e275f6c verified about 2 months ago

|

history blame contribute delete

3.39 kB

	---
	tags:
	- text-generation
	- causal-lm
	- LoRA
	- QLoRA
	- transformer
	license: apache-2.0
	datasets:
	- syubraj/medical-chat-phi-3.5-instruct-1k
	language:
	- en
	base_model:
	- microsoft/Phi-3.5-mini-instruct
	pipeline_tag: text-generation
	---

	# Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)

	This is a LoRA adapter for the `microsoft/Phi-3.5-mini-instruct` model, fine-tuned using QLoRA on medical instruction-following datasets. This is NOT a standalone model—you must load it with the base model.

	## 🔥 How to Use the LoRA Adapter

	To use this adapter, you need the base model `microsoft/Phi-3.5-mini-instruct`. Load it with `peft`:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel

	device = "cuda" if torch.cuda.is_available() else "cpu"
	print(device)

	# Define base model and your fine-tuned LoRA checkpoint
	base_model_name = "microsoft/Phi-3.5-mini-instruct"
	lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA"

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model_name)

	# Load model with proper 4-bit quantization settings
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4"
	)

	base_model = AutoModelForCausalLM.from_pretrained(
	"microsoft/Phi-3.5-mini-instruct",
	quantization_config=bnb_config,
	device_map="auto"
	)

	model = PeftModel.from_pretrained(base_model, lora_model_path)

	model = model.merge_and_unload()
	model.to(device)

	print("Model successfully loaded!")

	# Inference function
	def generate_response(user_query, system_message=None, max_length=1024):
	if system_message is None:
	system_message = ("You are a trusted AI-powered medical assistant. "
	"Analyze patient queries carefully and provide accurate, professional, and empathetic responses. "
	"Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.")

	# Prepare input prompt
	prompt = f"<\|system\|> {system_message} <\|end\|>\n<\|user\|> {user_query} <\|end\|>\n<\|assistant\|>"

	inputs = tokenizer(prompt, return_tensors="pt").to(device)
	outputs = model.generate(**inputs, max_length=max_length)

	# Decode response
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	return response.split("<\|assistant\|>")[-1].strip().split("<\|end\|>")[0].strip()

	if __name__ == "__main__":
	res = generate_response("Hi, How can someone let go of fever?")
	print(res)

	```

	---

	## 💡 Training Details
	- Base Model: [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
	- Fine-Tuned On: Medical conversations & instruction-based datasets
	- Fine-Tuning Method: QLoRA
	- Precision: 4-bit (`bitsandbytes`)

	---

	## 📌 License & Credits
	- This adapter follows the Apache-2.0 License.
	- Credits: [syubraj](https://huggingface.co/syubraj) for fine-tuning.

	---

	## 🚀 Citation
	If you use this model, please cite:

	```bibtex
	@misc{syubraj2024phi3.5medical,
	title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)},
	author={syubraj},
	year={2024},
	url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter}
	}
	```