ColonGPT-v1 / README.md

Add pipeline tag (#1)

7b0361c verified about 8 hours ago

4.7 kB

	---
	license: apache-2.0
	datasets:
	- ai4colonoscopy/ColonINST-v1
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- microsoft/phi-1_5
	library_name: adapter-transformers
	pipeline_tag: image-text-to-text
	tags:
	- medical
	- colonoscopy
	- polyp
	---

	# ColonGPT (A colonoscopy-specific multimodal Language Model)

	<p align="center">
	<img src="./assert/web_ui.gif" width="666px"/> <br />
	<em>The Gradio Web UI allows you to use our examples or upload your images for inference.</em>
	</p>

	📖 [Paper](https://arxiv.org/abs/2410.17241) \| 🏠 [Home](https://github.com/ai4colonoscopy/IntelliScope)

	> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora](https://drive.google.com/drive/folders/1Emi7o7DpN0zlCPIYqsCfNMr9LTPt3SCT?usp=sharing), including vision encoder (siglip) + language model (phi-1.5), and other fine-tuned weights on our ColonINST.

	Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (🤗 [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (🤗 [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.


	# Quick start

	Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.

	- Before running the snippet, you only need to install the following minimium dependencies.
	```shell
	conda create -n quickstart python=3.10
	conda activate quickstart
	pip install torch transformers accelerate pillow
	```
	- Then you can use `python script/quick_start/quickstart.py` to start.


	```python
	import torch
	import transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria
	from PIL import Image
	import warnings

	transformers.logging.set_verbosity_error()
	transformers.logging.disable_progress_bar()
	warnings.filterwarnings('ignore')

	device = 'cuda' # or cpu
	torch.set_default_device(device)

	model_name = "ai4colonoscopy/ColonGPT-v1"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16, # or float32 for cpu
	device_map='auto',
	trust_remote_code=True
	)

	tokenizer = AutoTokenizer.from_pretrained(
	model_name,
	trust_remote_code=True
	)

	class KeywordsStoppingCriteria(StoppingCriteria):
	def __init__(self, keyword, tokenizer, input_ids):
	self.keyword_id = tokenizer(keyword).input_ids
	self.tokenizer = tokenizer
	self.start_len = input_ids.shape[1]

	def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
	for keyword_id in self.keyword_id:
	if keyword_id in input_ids[0, -len(self.keyword_id):]:
	return True
	return False

	prompt = "Describe what you see in the image."
	text = f"USER: <image>\n{prompt} ASSISTANT:"
	text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
	input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)

	image = Image.open('cache/examples/example2.png')
	image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)

	stop_str = "<\|endoftext\|>"
	stopping_criteria = KeywordsStoppingCriteria(stop_str, tokenizer, input_ids)

	output_ids = model.generate(
	input_ids,
	images=image_tensor,
	do_sample=False,
	temperature=0,
	max_new_tokens=512,
	use_cache=True,
	stopping_criteria=[stopping_criteria]
	)

	outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).replace("<\|endoftext\|>", "").strip()
	print(outputs)
	```

	# License
	This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
	The content of this project itself is licensed under the Apache license 2.0.