Llama 2 7B quantized in 3-bit with GPTQ.

from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.gptq import GPTQQuantizer
import torch
w = 3
model_path = meta-llama/Llama-2-7b-hf

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
quantizer = GPTQQuantizer(bits=w, dataset="c4", model_seqlen = 4096)
quantized_model = quantizer.quantize_model(model, tokenizer)

Downloads last month: 1,654

Safetensors

Model size

927M params

Tensor type

I32

FP16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Collection including kaitchup/Llama-2-7b-hf-gptq-3bit

GPTQ

Collection

Llama 2 7B, 13B, Llama 3 8B, and Mistral 7B quantized with GPTQ in 2-bit, 3-bit, 4-bit and 8-bit with GPTQ. • 16 items • Updated May 6, 2024