base_model: | |
- mistralai/Mistral-7B-Instruct-v0.2 | |
library_name: transformers | |
license: mit | |
pipeline_tag: video-text-to-text | |
# VideoChat2-TPO | |
This model is based on the paper [Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment](https://huggingface.co/papers/2412.19326). | |
## π Installation | |
``` | |
pip install -r requirements.txt | |
python app.py | |
``` | |
## π§ Usage | |
``` | |
from transformers import AutoModel, AutoTokenizer | |
from tokenizer import MultimodalLlamaTokenizer | |
model_path = "OpenGVLab/VideoChat-TPO" | |
tokenizer = AutoTokenizer.from_pretrained(model_path, | |
trust_remote_code=True, | |
use_fast=False,) | |
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, _tokenizer=self.tokenizer).eval() | |
``` |