nintwentydo
/

pixtral-12b-FP8-dynamic-FP8-KV-cache

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions Community

pixtral-12b-FP8-dynamic-FP8-KV-cache / README.md

nintwentydo's picture

Create README.md

72f5202 verified about 2 months ago

|

744 Bytes

	---
	tags:
	- fp8
	- vllm
	language:
	- en
	- de
	- fr
	- it
	- pt
	- hi
	- es
	- th
	pipeline_tag: image-text-to-text
	license: apache-2.0
	library_name: vllm
	base_model:
	- mistral-community/pixtral-12b
	- mistralai/Pixtral-12B-2409
	base_model_relation: quantized
	datasets:
	- HuggingFaceH4/ultrachat_200k
	---

	# Pixtral-12B-2409: FP8 Dynamic Quant + FP8 KV Cache

	Quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b) using [LLM Compressor](https://github.com/vllm-project/llm-compressor) for optimised inference on VLLM.

	FP8 dynamic quant on language model, and FP8 quant of KV cache. multi_modal_projector and vision_tower left in FP16 since it's a small part of the model.

	Calibrated on 2048 ultrachat samples.