WisPerMed/Llama-3.1-SauerkrautLM-70b-Instruct-AWQ

Nov 29, 2024

Hello, I am having issues quantizing 70b models, would you be able to provide the script and list the hardware or service provider you used to quantize the model? I have rented h100s on runpod and still been unable to quantize a 70b with AWQ. Thank you!

bahadirery

Knowledge and Data Driven Personalized Medicine at the Point of Care org Dec 6, 2024

Hello,

We used this code to quantize the base model. Two h100s were enough to do it in 1-2 hours. I hope this helps.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "6,7"
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from transformers import AwqConfig, AutoConfig
from huggingface_hub import HfApi

quant_config = {
    "zero_point": True,
    "q_group_size": 128,
    "w_bit": 4,
    "version":"GEMM"
}

quantization_config = AwqConfig(
    bits=quant_config["w_bit"],
    group_size=quant_config["q_group_size"],
    zero_point=quant_config["zero_point"],
    version=quant_config["version"].lower(),
).to_dict()


model_dir = "VAGOsolutions/Llama-3.1-SauerkrautLM-70b-Instruct"
model = AutoAWQForCausalLM.from_pretrained(model_dir, cache_dir="cache_dir", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_dir,
                                          trust_remote_code=True)
model.quantize(
    tokenizer,
    quant_config=quant_config,
)

quant_path = "quan_path"
model.model.config.quantization_config = quantization_config
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

GusPuffy

Dec 6, 2024

Thank you! I will try it out

GusPuffy changed discussion status to closed Dec 6, 2024

WisPerMed
/

Llama-3.1-SauerkrautLM-70b-Instruct-AWQ

Quantization Script