OpenGVLab
/

VideoChat-TPO

Video-Text-to-Text

feature-extraction

Model card Files Files and versions Community

VideoChat-TPO / README.md

ynhe's picture

Add paper link and library name (#1)

7d22760 verified 10 days ago

|

history blame contribute delete

765 Bytes

	---
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	library_name: transformers
	license: mit
	pipeline_tag: video-text-to-text
	---

	# VideoChat2-TPO

	This model is based on the paper [Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment](https://huggingface.co/papers/2412.19326).

	## 🏃 Installation

	```
	pip install -r requirements.txt
	python app.py
	```

	## 🔧 Usage

	```
	from transformers import AutoModel, AutoTokenizer
	from tokenizer import MultimodalLlamaTokenizer

	model_path = "OpenGVLab/VideoChat-TPO"
	tokenizer = AutoTokenizer.from_pretrained(model_path,
	trust_remote_code=True,
	use_fast=False,)
	model = AutoModel.from_pretrained(model_path, trust_remote_code=True, _tokenizer=self.tokenizer).eval()
	```