Probing Visual Language Priors in VLMs

ImageDPO Finetuned Model

This page provides the ImageDPO finetuned checkpoint for LLaVA-v1.5-13B used in Probing Visual Language Priors in VLMs. We offer both LoRA parameters and the merged model weights for use.

Usage

First, install the LLaVA-v1.5 codebase.

Run the following command to have a try:

python -m llava.eval.run_llava \
    --model-path ViLP/LLaVA-v1.5-13b-ImageDPO \
    --image-file 'images/llava_logo.png' \
    --query 'Please caption this image.' \
    --conv-mode llava_v1

Citation Information

Please consider citing ViLP paper, if you find our resource helpful!

@article{luo2024probing,
      title={Probing Visual Language Priors in VLMs},
      author={Luo, Tiange and Cao, Ang and Lee, Gunhee and Johnson, Justin and Lee, Honglak},
      journal={arXiv preprint arXiv:2501.00569},
      year={2024}
}
Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.