q-future/co-instruct-llava-v1.5-7b

Training the Co-Instruct-562K dataset with LLaVA-1.5-7B to facilitate users that prefer the LLaVA structure.

It is notably less accurate than the main version: https://huggingface.co/q-future/co-instruct, please refer to that checkpoint if you want a more accurate model.

Preliminary Results:

Q-Bench-Single-MCQ (A1, test): 73.38% (Co-Instruct-Main: 77.11%, GPT-4V-Turbo: 74.10%, Q-Instruct-LLaVA-v1.5: 67.42%, LLaVA-v1.5: 60.07%)
Q-Bench-Pair-MCQ (A1, test): 75.88% (Co-Instruct-Main: 80.18%, GPT-4V-Turbo: 78.07%, Q-Instruct-LLaVA-v1.5: 54.50%, LLaVA-v1.5: 52.25%)

We are working on improving it in the future but we also warn that this structure (direct projection) might not be very friendly to multi-image scenarios.

q-future
/

co-instruct-llava-v1.5-7b

Dataset used to train q-future/co-instruct-llava-v1.5-7b