mehmetkeremturkcan
/

DeepSeek-LLaVA-Instruct

Image-Text-to-Text

Model card Files Files and versions Community

DeepSeek-LLaVA-Instruct / README.md

mehmetkeremturkcan's picture

mehmetkeremturkcan

Update README.md

9a99546 verified about 2 months ago

|

history blame contribute delete

1.73 kB

	---
	license: apache-2.0
	datasets:
	- 5CD-AI/LLaVA-CoT-o1-Instruct
	- HuggingFaceM4/the_cauldron
	- AnyModal/flickr30k
	- openbmb/RLAIF-V-Dataset
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
	- google/vit-large-patch32-384
	library_name: transformers
	pipeline_tag: image-text-to-text
	tags:
	- vqa
	- vlm
	---

	<p align="center">
	<img src="https://github.com/mkturkcan/deepseek-vlm/blob/main/assets/logo.png?raw=true" width="180" />
	</p>
	<h1 align="center">
	<p>mehmetkeremturkcan/DeepSeek-LLaVA-Instruct</p>
	</h1>
	<h3 align="center">
	<p>DeepSeer: Vision Language Models with Reasoning</p>
	</h3>

	Vision language models with chain-of-thought reasoning are just starting to emerge. This is a proof-of-concept to train a vision model with thinking-enabled chat templates based on DeepSeek-R1 models.

	Note that this model will not always use thinking tokens, due to the current lack of high-quality CoT data in non-science contexts.

	## Setup
	```bash
	pip install git+https://github.com/facebookresearch/schedule_free.git
	pip install peft
	git clone https://github.com/mkturkcan/seers.git
	cd seers/seers/
	git clone https://huggingface.co/mehmetkeremturkcan/DeepSeek-LLaVA-Instruct
	```
	## Test
	Run, in the seers/seers folder,
	```bash
	python predict_llava.py
	```

	## Train

	[seers](https://github.com/mkturkcan/seers) training code is public! Run
	```bash
	python train_cot_mixed.py
	```

	## Training Details
	This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) on the [5CD-AI/LLaVA-CoT-o1-Instruct](https://huggingface.co/datasets/5CD-AI/LLaVA-CoT-o1-Instruct) dataset.
	It has been trained using [seers](https://github.com/mkturkcan/seers).