FreedomIntelligence
/

HuatuoGPT-Vision-7B-hf

Text Generation

image-text-to-text

Model card Files Files and versions Community

HuatuoGPT-Vision-7B-hf / README.md

jymcc's picture

Update README.md

ea02880 verified 2 days ago

|

3.44 kB

	---
	license: apache-2.0
	datasets:
	- FreedomIntelligence/PubMedVision
	language:
	- en
	- zh
	pipeline_tag: text-generation
	tags:
	- vision
	- image-text-to-text
	---
	<div align="center">
	<h1>
	HuatuoGPT-Vision-7B
	</h1>
	</div>

	<div align="center">
	<a href="https://github.com/FreedomIntelligence/HuatuoGPT-Vision" target="_blank">GitHub</a> \| <a href="https://arxiv.org/abs/2406.19280" target="_blank">Paper</a>
	</div>

	## Introduction
	We convert HuatuoGPT-Vision into Huggingface LLaVA format, so you can run the model using VLLM or other frameworks. The original model can be found here: [HuatuoGPT-Vision-7B](https://huggingface.co/FreedomIntelligence/HuatuoGPT-Vision-7B).

	# Quick Start

	### 1. Deploy the model using [VLLM](https://github.com/vllm-project/vllm/tree/main)
	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model huatuogpt_vision_model_path \
	--tensor_parallel_size 1 \
	--gpu_memory_utilization 0.8 \
	--served-model-name huatuogpt_vision_7b \
	--chat-template "{%- if messages[0]['role'] == 'system' -%}\n {%- set system_message = messages[0]['content'] -%}\n {%- set messages = messages[1:] -%}\n{%- else -%}\n {% set system_message = '' -%}\n{%- endif -%}\n\n{%- for message in messages -%}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}\n {%- endif -%}\n\n {%- if message['role'] == 'user' -%}\n {{ '<\|user\|>\n' + message['content'] + '\n' }}\n {%- elif message['role'] == 'assistant' -%}\n {{ '<\|assistant\|>\n' + message['content'] + '\n' }}\n {%- endif -%}\n{%- endfor -%}\n\n{%- if add_generation_prompt -%}\n {{ '<\|assistant\|>' }}\n{% endif %}" \
	--port 9559 --max-model-len 2048 > vllm_openai_server.log 2>&1 &
	```

	### 2. Model inference
	```python
	from openai import OpenAI
	from PIL import Image
	import base64
	import io

	def get_image(image_path):
	image = Image.open(image_path).convert('RGB')
	img_type = image.format
	if not img_type:
	img_type = image_path.split('.')[-1]
	byte_arr = io.BytesIO()
	image.save(byte_arr, format=img_type)
	byte_arr.seek(0)
	image = base64.b64encode(byte_arr.getvalue()).decode()
	return image, img_type


	client = OpenAI(
	base_url="http://localhost:9559/v1",
	api_key="token-abc123"
	)
	image_path = 'your_image_path'
	image, img_type = get_image(image_path)


	inputcontent = [{
	"type": "text",
	"text": '<image>\nWhat does the picture show?'
	}]

	inputcontent.append({
	"type": "image_url",
	"image_url": {
	"url": f"data:image/{img_type};base64,{image}"
	}
	})

	response = client.chat.completions.create(
	model="huatuogpt_vision_7b",
	messages=[
	{"role": "user", "content": inputcontent}
	],
	temperature=0.2
	)
	print(response.choices[0].message.content)
	```

	# <span id="Start">Citation</span>

	```
	@misc{chen2024huatuogptvisioninjectingmedicalvisual,
	title={HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale},
	author={Junying Chen and Ruyi Ouyang and Anningzhe Gao and Shunian Chen and Guiming Hardy Chen and Xidong Wang and Ruifei Zhang and Zhenyang Cai and Ke Ji and Guangjun Yu and Xiang Wan and Benyou Wang},
	year={2024},
	eprint={2406.19280},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2406.19280},
	}
	```