smpanaro's picture
Create README.md
b3bc4c9 verified
metadata
license: mit
datasets:
  - wikitext

pythia-1.4b quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/pythia-160m-AutoGPTQ-4bit-128g 33.4375 23.3024 10.1351
smpanaro/pythia-410m-AutoGPTQ-4bit-128g 21.4688 13.9838 7.485
smpanaro/pythia-1b-AutoGPTQ-4bit-128g 12.0391 11.6178 0.4213
smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g 10.9609 10.4391 0.5218

Wikitext perplexity measured as in the huggingface docs, lower is better