Mistral-Large-Instruct-2407 FP8
This repository contains the quantized weights for Mistral-Large-Instruct-2407.
The weights have been converted to FP8 format, with FP8 weights, FP8 activations, and FP8 KV cache. You can use either vLLM or Aphrodite Engine to load this model.
Quantization Method
The library used is llm-compressor.
pip install llmcompressor
Then run this script:
from datasets import load_dataset
from transformers import AutoTokenizer
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
MODEL_ID = "mistralai/Mistral-Large-Instruct-2407"
model = SparseAutoModelForCausalLM.from_pretrained(
MODEL_ID,
device_map="auto",
torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
# Select calibration dataset.
DATASET_ID = "HuggingFaceH4/ultrachat_200k" # Or use your own dataset
DATASET_SPLIT = "train_sft"
# You can increase the the number of samples to increase accuracy
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048
ds = load_dataset(DATASET_ID, split=DATASET_SPLIT)
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
def process_and_tokenize(example):
text = tokenizer.apply_chat_template(example["messages"], tokenize=False)
return tokenizer(
text,
padding=False,
max_length=MAX_SEQUENCE_LENGTH,
truncation=True,
add_special_tokens=False,
)
ds = ds.map(process_and_tokenize, remove_columns=ds.column_names)
# Configure the quantization algorithm and scheme.
# In this case, we:
# * quantize the weights to fp8 with per-tensor scales
# * quantize the activations to fp8 with per-tensor scales
# * quantize the kv cache to fp8 with per-tensor scales
recipe = """
quant_stage:
quant_modifiers:
QuantizationModifier:
ignore: ["lm_head"]
config_groups:
group_0:
weights:
num_bits: 8
type: float
strategy: tensor
dynamic: false
symmetric: true
input_activations:
num_bits: 8
type: float
strategy: tensor
dynamic: false
symmetric: true
targets: ["Linear"]
kv_cache_scheme:
num_bits: 8
type: float
strategy: tensor
dynamic: false
symmetric: true
"""
# Apply algorithms.
oneshot(
model=model,
dataset=ds,
recipe=recipe,
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)
# Save to disk compressed.
SAVE_DIR = "./Mistral-Large-Instruct-2407-FP8"
model.save_pretrained(SAVE_DIR, save_compressed=True)
tokenizer.save_pretrained(SAVE_DIR)
- Downloads last month
- 99
Model tree for alpindale/Mistral-Large-Instruct-2407-FP8
Base model
mistralai/Mistral-Large-Instruct-2407