p1atdev
/

FLUX.1-schnell-t5-xxl-quanto

Model card Files Files and versions Community

FLUX.1-schnell-t5-xxl-quanto / README.md

p1atdev's picture

Update README.md

a7bd751 verified 7 months ago

|

history blame contribute delete

1.29 kB

	---
	license: apache-2.0
	---

	Quantized T5-XXL of FLUX.1[schnell] using HuggingFace [optimum-quanto](https://github.com/huggingface/optimum-quanto).

	### Quantize

	```py
	import torch
	from transformers import T5EncoderModel
	from optimum.quanto import (
	QuantizedTransformersModel,
	qfloat8_e4m3fn,
	qfloat8_e5m2,
	qint8,
	qint4,
	)

	REPO_NAME = "black-forest-labs/FLUX.1-schnell"
	TEXT_ENCODER = "text_encoder_2"

	model = T5EncoderModel.from_pretrained(
	REPO_NAME, subfolder=TEXT_ENCODER, torch_dtype=torch.bfloat16
	)
	qmodel = QuantizedTransformersModel.quantize(
	model,
	weights=qfloat8_e4m3fn,
	)
	qmodel.save_pretrained("./t5_xxl/qfloat8_e4m3fn")
	```

	### Load

	Currently `QuantizedTransformersModel` [does not support](https://github.com/huggingface/optimum-quanto/blob/601dc193ce0ed381c479fde54a81ba546bdf64d1/optimum/quanto/models/transformers_models.py#L151) load a quantized model from huggingface hub.

	```py
	from transformers import T5EncoderModel, AutoModelForTextEncoding
	from optimum.quanto import QuantizedTransformersModel

	MODEL_PATH = "./t5_xxl/qfloat8_e4m3fn"

	class QuantizedModelForTextEncoding(QuantizedTransformersModel):
	auto_class = AutoModelForTextEncoding

	qmodel = QuantizedModelForTextEncoding.from_pretrained(
	"./t5_xxl/qint8",
	)
	```