Quantization Script
#1
by
GusPuffy
- opened
Hello, I am having issues quantizing 70b models, would you be able to provide the script and list the hardware or service provider you used to quantize the model? I have rented h100s on runpod and still been unable to quantize a 70b with AWQ. Thank you!
Hello,
We used this code to quantize the base model. Two h100s were enough to do it in 1-2 hours. I hope this helps.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from transformers import AwqConfig, AutoConfig
from huggingface_hub import HfApi
quant_config = {
"zero_point": True,
"q_group_size": 128,
"w_bit": 4,
"version":"GEMM"
}
quantization_config = AwqConfig(
bits=quant_config["w_bit"],
group_size=quant_config["q_group_size"],
zero_point=quant_config["zero_point"],
version=quant_config["version"].lower(),
).to_dict()
model_dir = "VAGOsolutions/Llama-3.1-SauerkrautLM-70b-Instruct"
model = AutoAWQForCausalLM.from_pretrained(model_dir, cache_dir="cache_dir", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_dir,
trust_remote_code=True)
model.quantize(
tokenizer,
quant_config=quant_config,
)
quant_path = "quan_path"
model.model.config.quantization_config = quantization_config
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
Thank you! I will try it out
GusPuffy
changed discussion status to
closed