p1atdev's picture
Update README.md
a7bd751 verified
---
license: apache-2.0
---
Quantized T5-XXL of FLUX.1[schnell] using HuggingFace [optimum-quanto](https://github.com/huggingface/optimum-quanto).
### Quantize
```py
import torch
from transformers import T5EncoderModel
from optimum.quanto import (
QuantizedTransformersModel,
qfloat8_e4m3fn,
qfloat8_e5m2,
qint8,
qint4,
)
REPO_NAME = "black-forest-labs/FLUX.1-schnell"
TEXT_ENCODER = "text_encoder_2"
model = T5EncoderModel.from_pretrained(
REPO_NAME, subfolder=TEXT_ENCODER, torch_dtype=torch.bfloat16
)
qmodel = QuantizedTransformersModel.quantize(
model,
weights=qfloat8_e4m3fn,
)
qmodel.save_pretrained("./t5_xxl/qfloat8_e4m3fn")
```
### Load
Currently `QuantizedTransformersModel` [does not support](https://github.com/huggingface/optimum-quanto/blob/601dc193ce0ed381c479fde54a81ba546bdf64d1/optimum/quanto/models/transformers_models.py#L151) load a quantized model from huggingface hub.
```py
from transformers import T5EncoderModel, AutoModelForTextEncoding
from optimum.quanto import QuantizedTransformersModel
MODEL_PATH = "./t5_xxl/qfloat8_e4m3fn"
class QuantizedModelForTextEncoding(QuantizedTransformersModel):
auto_class = AutoModelForTextEncoding
qmodel = QuantizedModelForTextEncoding.from_pretrained(
"./t5_xxl/qint8",
)
```