nintwentydo commited on
Commit
cb89c5a
·
verified ·
1 Parent(s): 72f5202

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -28,4 +28,11 @@ Quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-communit
28
 
29
  FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
30
 
31
- Calibrated on 2048 ultrachat samples.
 
 
 
 
 
 
 
 
28
 
29
  FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
30
 
31
+ Calibrated on 2048 ultrachat samples.
32
+
33
+ Example VLLM usage
34
+ ```
35
+ vllm serve nintwentydo/pixtral-12b-FP8-dynamic-FP8-KV-cache --quantization fp8 --kv-cache-dtype fp8
36
+ ```
37
+
38
+ Supported on Nvidia GPUs with compute capability > 8.9 (Ada Lovelace, Hopper).