|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
Quantized T5-XXL of FLUX.1[schnell] using HuggingFace [optimum-quanto](https://github.com/huggingface/optimum-quanto). |
|
|
|
### Quantize |
|
|
|
```py |
|
import torch |
|
from transformers import T5EncoderModel |
|
from optimum.quanto import ( |
|
QuantizedTransformersModel, |
|
qfloat8_e4m3fn, |
|
qfloat8_e5m2, |
|
qint8, |
|
qint4, |
|
) |
|
|
|
REPO_NAME = "black-forest-labs/FLUX.1-schnell" |
|
TEXT_ENCODER = "text_encoder_2" |
|
|
|
model = T5EncoderModel.from_pretrained( |
|
REPO_NAME, subfolder=TEXT_ENCODER, torch_dtype=torch.bfloat16 |
|
) |
|
qmodel = QuantizedTransformersModel.quantize( |
|
model, |
|
weights=qfloat8_e4m3fn, |
|
) |
|
qmodel.save_pretrained("./t5_xxl/qfloat8_e4m3fn") |
|
``` |
|
|
|
### Load |
|
|
|
Currently `QuantizedTransformersModel` [does not support](https://github.com/huggingface/optimum-quanto/blob/601dc193ce0ed381c479fde54a81ba546bdf64d1/optimum/quanto/models/transformers_models.py#L151) load a quantized model from huggingface hub. |
|
|
|
```py |
|
from transformers import T5EncoderModel, AutoModelForTextEncoding |
|
from optimum.quanto import QuantizedTransformersModel |
|
|
|
MODEL_PATH = "./t5_xxl/qfloat8_e4m3fn" |
|
|
|
class QuantizedModelForTextEncoding(QuantizedTransformersModel): |
|
auto_class = AutoModelForTextEncoding |
|
|
|
qmodel = QuantizedModelForTextEncoding.from_pretrained( |
|
"./t5_xxl/qint8", |
|
) |
|
``` |