Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
5.25.2
Quantization
Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference. Diffusers supports 8-bit and 4-bit quantization with bitsandbytes.
Quantization techniques that aren't supported in Transformers can be added with the [DiffusersQuantizer
] class.
Learn how to quantize models in the Quantization guide.
BitsAndBytesConfig
[[autodoc]] BitsAndBytesConfig
GGUFQuantizationConfig
[[autodoc]] GGUFQuantizationConfig
TorchAoConfig
[[autodoc]] TorchAoConfig
DiffusersQuantizer
[[autodoc]] quantizers.base.DiffusersQuantizer