👔 Working & tested quants for Qwen2.5 VL 7B.

Made using QuantBench!

Get QuantBench on GitHub: https://github.com/Independent-AI-Labs/local-super-agents/tree/main/quantbench

The models have been tested on latest llama.cpp built with CLIP hardware acceleration manually enabled!

Consult the following post for more details: https://github.com/ggml-org/llama.cpp/issues/11483#issuecomment-2676422772

For now you can only do single cli calls:

llama-qwen2vl-cli -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image." --image ~/Pictures/test_small.png

We're working on a wrapper API solution until multimodal support is added back to llama.cpp

API will be published here: https://github.com/Independent-AI-Labs/local-super-agents

Let us know if you need a specific quant!

💪 Benchmarking Update:

The latest main looks stable with Vulkan CLIP and any model thrown at it so far. Some preliminary insights:

1200x1200 is the maximum you can encode with 16GB of VRAM. clip.cpp does not seem to support multi-GPU Vulkan yet.

You will get an OOM with larger images, so make sure to pre-process accordingly!
A 4060Ti-class GPU delivers 20-30 t/s with the Q8_0 and double that on Q4 @ 16-32K context.
Batching (multiple prompts) in a single cli call seems to be working fine:

llama-qwen2vl-cli --ctx-size 16000 -n 16000 -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image in detail. Extract all textual information from it. Output as detailed JSON." -p "Analyze the image." --image ~/Pictures/test_small.png --image ~/Pictures/test_small.png

IAILabs
/

Qwen2.5-VL-7b-Instruct-GGUF

👔 Working & tested quants for Qwen2.5 VL 7B.

Made using QuantBench!

Let us know if you need a specific quant!

💪 Benchmarking Update:

Output quality looks very promising! We'll release all of the benchmark code when ready, so the process can be streamlined for other models.

Model tree for IAILabs/Qwen2.5-VL-7b-Instruct-GGUF