Pixtral-12B-2409: int4 Weight Quant

W4A16 quant of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.

vision_tower kept at FP16. language_model weights quantized to 4bit.

Calibrated on 512 flickr samples.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4' 

If you want a more advanced/fully featured chat template you can use this jinja template

Downloads last month
291
Safetensors
Model size
3.23B params
Tensor type
I64
I32
BF16
Inference Examples
Inference API (serverless) does not yet support vllm models for this pipeline type.

Model tree for nintwentydo/pixtral-12b-2409-W4A16-G128

Quantized
(5)
this model