Pixtral-12B-2409: int4 Weight Quant

vision_tower kept at FP16. language_model weights quantized to 4bit.

Calibrated on 512 flickr samples.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4'

If you want a more advanced/fully featured chat template you can use this jinja template

Safetensors

Model size

3.23B params

Tensor type

I64

I32

BF16

Inference Examples

Inference API (serverless) does not yet support vllm models for this pipeline type.

Model tree for nintwentydo/pixtral-12b-2409-W4A16-G128

Base model

Quantized

(5)

this model