--- base_model: - meta-llama/Llama-3.2-1B-Instruct license: llama3.2 --- This model is a quantized version of Llama-3.2-1B-Instruct. Code used for generation is as follows: ```python from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig import torch model_id = "meta-llama/Llama-3.2-1B-Instruct" quantization_config = GPTQConfig( bits=4, group_size=128, dataset="c4", desc_act=False, ) tokenizer = AutoTokenizer.from_pretrained(model_id) quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto') ```