Max Lapin

Add readme and license

2138d69 2 months ago

4.57 kB

	---
	pipeline_tag: text-to-image
	inference: false
	license: other
	license_name: stabilityai-ai-community
	license_link: LICENSE.md
	tags:
	- tensorrt
	- sd3.5-large
	- text-to-image
	- onnx
	- model-optimizer
	- fp8
	- quantization
	extra_gated_prompt: >-
	By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/raw/main/LICENSE.md)
	and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy).
	extra_gated_fields:
	Name: text
	Email: text
	Country: country
	Organization or Affiliation: text
	Receive email updates and promotions on Stability AI products, services, and research?:
	type: select
	options:
	- 'Yes'
	- 'No'
	I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox
	language:
	- en
	---

	# Stable Diffusion 3.5 Large TensorRT
	## Introduction

	This repository hosts the TensorRT version of Stable Diffusion 3.5 Large created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency.

	Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

	## Model Details

	### Model Description
	This repository holds the ONNX exports of the CLIP, T5, MMDiT and VAE models in BF16 precision. It also holds the MMDiT model in FP8 precision. The transformer model was quantized to FP8 precision using [NVIDIA/TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer).


	## Performance using TensorRT 10.11
	#### Timings for 30 steps at 1024x1024

	\| Accelerator \| Precision \| CLIP-G \| CLIP-L \| T5 \| MMDiT x 30 \| VAE Decoder \| Total \|
	\|-------------\|-----------\|------------\|--------------\|--------------\|-----------------------\|---------------------\|------------------------\|
	\| H100 \| BF16 \| 4.02 ms \| 1.21 ms \| 9.74 ms \| 11444.8 ms \| 109.2 ms \| 11586.98 ms \|
	\| H100 \| FP8 \| 3.68 ms \| 1.2 ms \| 8.82 ms \| 5831.44 ms \| 79.44 ms \| 5940.05 ms \|


	## Usage Example
	1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/10.11/demo/Diffusion/README.md) on launching a TensorRT NGC container.
	```shell
	git clone https://github.com/NVIDIA/TensorRT.git
	cd TensorRT
	git checkout release/10.11
	docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
	```


	2. Install libraries and requirements
	```shell
	cd demo/Diffusion
	python3 -m pip install --upgrade pip
	pip3 install -r requirements.txt
	python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12==10.11.0
	```

	3. Generate HuggingFace user access token
	To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) page.
	You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens).

	```bash
	export HF_TOKEN=<your access token>
	```

	4. Perform TensorRT optimized inference:

	- Stable Diffusion 3.5 Large in BF16 precision

	```
	python3 demo_txt2img_sd35.py \
	"A chic urban apartment interior highlighting mid-century modern furniture, vibrant abstract art pieces on clean white walls, and large windows providing a stunning view of the bustling city below." \
	--version=3.5-large \
	--bf16 \
	--download-onnx-models \
	--denoising-steps=30 \
	--guidance-scale 3.5 \
	--build-static-batch \
	--use-cuda-graph \
	--hf-token=$HF_TOKEN
	```

	- Stable Diffusion 3.5 Large using FP8 quantization

	```
	python3 demo_txt2img_sd35.py \
	"A chic urban apartment interior highlighting mid-century modern furniture, vibrant abstract art pieces on clean white walls, and large windows providing a stunning view of the bustling city below." \
	--version=3.5-large \
	--fp8 \
	--denoising-steps=30 \
	--guidance-scale 3.5 \
	--download-onnx-models \
	--build-static-batch \
	--use-cuda-graph \
	--hf-token=$HF_TOKEN \
	--onnx-dir onnx_fp8 \
	--engine-dir engine_fp8
	```