nintwentydo's picture
Create README.md
72f5202 verified
|
raw
history blame
744 Bytes
metadata
tags:
  - fp8
  - vllm
language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
pipeline_tag: image-text-to-text
license: apache-2.0
library_name: vllm
base_model:
  - mistral-community/pixtral-12b
  - mistralai/Pixtral-12B-2409
base_model_relation: quantized
datasets:
  - HuggingFaceH4/ultrachat_200k

Pixtral-12B-2409: FP8 Dynamic Quant + FP8 KV Cache

Quant of mistral-community/pixtral-12b using LLM Compressor for optimised inference on VLLM.

FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.

Calibrated on 2048 ultrachat samples.