Pixtral-12B-2409: int4 Weight Quant
W4A16 quant of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.
vision_tower kept at FP16. language_model weights quantized to 4bit.
Calibrated on 512 flickr samples.
Example VLLM usage
vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4'
If you want a more advanced/fully featured chat template you can use this jinja template
- Downloads last month
- 291
Inference API (serverless) does not yet support vllm models for this pipeline type.
Model tree for nintwentydo/pixtral-12b-2409-W4A16-G128
Base model
mistral-community/pixtral-12b