Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

How to quantize bloom with 4-bit

#268
by char-1ee - opened

Hi, I noticed that there already exists bloom-int8 and bloom-fp16 models. Anyone know where can find bloom-int4 model, or how can I quantize a 4bit model locally?

BigScience Workshop org

Hi @char-1ee

If you have enough CPU RAM to load the entire BLOOM model, you can easily quantize it on-the-fly in 4bit using bitsandbytes and the latest transformers package.

pip install -U bitsandbytes transformers

Simply pass load_in_4bit=True when calling from_pretrained and that should do the trick to quantize the model in 4bit precision.

Let me know how that goes for you!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment