Quantization Script

#1
by GusPuffy - opened

Hello, I am having issues quantizing 70b models, would you be able to provide the script and list the hardware or service provider you used to quantize the model? I have rented h100s on runpod and still been unable to quantize a 70b with AWQ. Thank you!

Knowledge and Data Driven Personalized Medicine at the Point of Care org

Hello,

We used this code to quantize the base model. Two h100s were enough to do it in 1-2 hours. I hope this helps.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from transformers import AwqConfig, AutoConfig
from huggingface_hub import HfApi

quant_config = {
    "zero_point": True,
    "q_group_size": 128,
    "w_bit": 4,
    "version":"GEMM"
}

quantization_config = AwqConfig(
    bits=quant_config["w_bit"],
    group_size=quant_config["q_group_size"],
    zero_point=quant_config["zero_point"],
    version=quant_config["version"].lower(),
).to_dict()


model_dir = "VAGOsolutions/Llama-3.1-SauerkrautLM-70b-Instruct"
model = AutoAWQForCausalLM.from_pretrained(model_dir, cache_dir="cache_dir", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_dir,
                                          trust_remote_code=True)
model.quantize(
    tokenizer,
    quant_config=quant_config,
)

quant_path = "quan_path"
model.model.config.quantization_config = quantization_config
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

Thank you! I will try it out

GusPuffy changed discussion status to closed

Sign up or log in to comment