Video-LLaVA-Seg

This is the official baseline implementation for the ViCas dataset. This is the pretrained model which has been optimized on a subset of WebVid10M and Panda70M for video captioning. The final model which has been finetuned on ViCaS is hosted here.

For details about setting up the model, refer to the Video-LLaVA-Seg GitHub repo.

For details about downloading and evaluating the dataset benchmark, refer to the ViCaS GitHub repo.

Downloads last month: 5

Safetensors

Model size

8.7B params

Tensor type

I64

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.