Video-LLaVA-Seg

Project | Arxiv

This is the official baseline implementation for the ViCas dataset. This is the pretrained model which has been optimized on a subset of WebVid10M and Panda70M for video captioning. The final model which has been finetuned on ViCaS is hosted here.

For details about setting up the model, refer to the Video-LLaVA-Seg GitHub repo.

For details about downloading and evaluating the dataset benchmark, refer to the ViCaS GitHub repo.

Downloads last month
5
Safetensors
Model size
8.7B params
Tensor type
I64
·
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.