Update README.md
Browse files
README.md
CHANGED
@@ -28,4 +28,11 @@ Quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-communit
|
|
28 |
|
29 |
FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
|
30 |
|
31 |
-
Calibrated on 2048 ultrachat samples.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.
|
30 |
|
31 |
+
Calibrated on 2048 ultrachat samples.
|
32 |
+
|
33 |
+
Example VLLM usage
|
34 |
+
```
|
35 |
+
vllm serve nintwentydo/pixtral-12b-FP8-dynamic-FP8-KV-cache --quantization fp8 --kv-cache-dtype fp8
|
36 |
+
```
|
37 |
+
|
38 |
+
Supported on Nvidia GPUs with compute capability > 8.9 (Ada Lovelace, Hopper).
|