sethuiyer
/

Medichat-Llama3-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Medichat-Llama3-8B / README.md

sethuiyer's picture

Update README.md

a2b4c74 verified 10 months ago

|

2.25 kB

	---
	base_model:
	- Undi95/Llama-3-Unholy-8B
	- Locutusque/llama-3-neural-chat-v1-8b
	- ruslanmv/Medical-Llama3-8B-16bit
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: llama2
	language:
	- en
	---

	### Medichat-Llama3-8B

	![img](https://huggingface.co/sethuiyer/Medichat-Llama3-8B/resolve/main/medichat_llam3.webp)

	The following YAML configuration was used to produce this model:

	```yaml

	models:
	- model: Undi95/Llama-3-Unholy-8B
	parameters:
	weight: [0.25, 0.35, 0.45, 0.35, 0.25]
	density: [0.1, 0.25, 0.5, 0.25, 0.1]
	- model: Locutusque/llama-3-neural-chat-v1-8b
	- model: ruslanmv/Medical-Llama3-8B-16bit
	parameters:
	weight: [0.55, 0.45, 0.35, 0.45, 0.55]
	density: [0.1, 0.25, 0.5, 0.25, 0.1]
	merge_method: dare_ties
	base_model: Locutusque/llama-3-neural-chat-v1-8b
	parameters:
	int8_mask: true
	dtype: bfloat16

	```

	### Usage:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("sethuiyer/Medichat-Llama3-8B")
	model = AutoModelForCausalLM.from_pretrained("sethuiyer/Medichat-Llama3-8B").to("cuda")

	# Function to format and generate response with prompt engineering using a chat template
	def askme(question):
	sys_message = '''
	You are an AI Medical Assistant trained on a vast dataset of health information. Please be thorough and
	provide an informative answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
	'''

	# Create messages structured for the chat template
	messages = [{"role": "system", "content": sys_message}, {"role": "user", "content": question}]

	# Applying chat template
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True) # Adjust max_new_tokens for longer responses

	# Extract and return the generated text
	answer = tokenizer.batch_decode(outputs)[0].strip()
	return answer

	# Example usage
	question = '''
	Symptoms:
	Dizziness, headache and nausea.

	What is the differnetial diagnosis?
	'''
	print(askme(question))
	```