Text Generation
Transformers
Safetensors
llava_llama
Inference Endpoints

Training the Co-Instruct-562K dataset with LLaVA-1.5-7B to facilitate users that prefer the LLaVA structure.

It is notably less accurate than the main version: https://huggingface.co/q-future/co-instruct, please refer to that checkpoint if you want a more accurate model.

Preliminary Results:

  • Q-Bench-Single-MCQ (A1, test): 73.38% (Co-Instruct-Main: 77.11%, GPT-4V-Turbo: 74.10%, Q-Instruct-LLaVA-v1.5: 67.42%, LLaVA-v1.5: 60.07%)
  • Q-Bench-Pair-MCQ (A1, test): 75.88% (Co-Instruct-Main: 80.18%, GPT-4V-Turbo: 78.07%, Q-Instruct-LLaVA-v1.5: 54.50%, LLaVA-v1.5: 52.25%)

We are working on improving it in the future but we also warn that this structure (direct projection) might not be very friendly to multi-image scenarios.

Downloads last month
15
Safetensors
Model size
7.06B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train q-future/co-instruct-llava-v1.5-7b