nintwentydo
/

pixtral-12b-FP8-dynamic-FP8-KV-cache

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions Community

nintwentydo commited on Dec 29, 2024

Commit

cb89c5a

·

verified ·

1 Parent(s): 72f5202

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -28,4 +28,11 @@ Quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-communit
 FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
-Calibrated on 2048 ultrachat samples.

 FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
+Calibrated on 2048 ultrachat samples.
+Example VLLM usage
+```
+vllm serve nintwentydo/pixtral-12b-FP8-dynamic-FP8-KV-cache --quantization fp8 --kv-cache-dtype fp8
+```
+Supported on Nvidia GPUs with compute capability > 8.9 (Ada Lovelace, Hopper).