--- license: apache-2.0 base_model: - stabilityai/stable-diffusion-xl-base-1.0 pipeline_tag: text-to-image tags: - text-generation-inference - stable-diffusion - text-to-image - stable-diffusion-xl - stable-diffusion-xl-diffusers --- # Stable-fast-xl Stable-fast is an ultra lightweight inference optimization framework for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques. this repository contains a compact installation of the stable-fast compiler https://github.com/chengzeyi/stable-fast and its inference with the stable-diffusion-xl-base-1.0 Inference with [stable-diffusion-xl-base-1.0)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1) ![image.png](https://cdn-uploads.huggingface.co/production/uploads/670503434c094132b2282e63/Xib4SHo9PX7-oSWP3Or3Y.png) ![image.png](https://cdn-uploads.huggingface.co/production/uploads/670503434c094132b2282e63/-a7V70NkS09TeMSZAKgVB.png) # Inference SDXL model 30%+ faster!!! ## Differences With Other Acceleration Libraries #### Fast: stable-fast is specialy optimized for HuggingFace Diffusers. It achieves a high performance across many libraries. And it provides a very fast compilation speed within only a few seconds. It is significantly faster than **torch.compile**, **TensorRT** and **AITemplate** in compilation time. #### Minimal: stable-fast works as a plugin framework for **PyTorch**. It utilizes existing PyTorch functionality and infrastructures and is compatible with other acceleration techniques, as well as popular fine-tuning techniques and deployment solutions. # How to use ### Install dependencies ```bash pip install diffusers transformers safetensors accelerate sentencepiece ``` ### Download repository and run script for stable-fast installation ```bash git clone https://huggingface.co/artemtumch/stable-fast-xl cd stable-fast-xl ``` open **install_stable-fast.sh** file and change cp311 for your python version in this line pip install -q https://github.com/chengzeyi/stable-fast/releases/download/v0.0.15/stable_fast-0.0.15+torch210cu118-cp311-cp311-manylinux2014_x86_64.whl where **cp311** -> for **python 3.11** **|** **cp38** -> for **python3.8** then run script ```bash sh install_stable-fast.sh ``` ## Generate image ```py from diffusers import DiffusionPipeline import torch from sfast.compilers.stable_diffusion_pipeline_compiler import ( compile, CompilationConfig ) import xformers import triton pipe = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16" ) # enable to reduce GPU VRAM usage (~30%) # pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16) pipe.to("cuda") # if using torch < 2.0 # pipe.enable_xformers_memory_efficient_attention() config = CompilationConfig.Default() config.enable_xformers = True config.enable_triton = True config.enable_cuda_graph = True pipe = compile(pipe, config) prompt = "An astronaut riding a green horse" images = pipe(prompt=prompt).images[0] ``` ## Inpainting ```py from diffusers import StableDiffusionXLInpaintPipeline from diffusers.utils import load_image import torch from sfast.compilers.stable_diffusion_pipeline_compiler import ( compile, CompilationConfig ) import xformers import triton pipe = StableDiffusionXLInpaintPipeline.from_pretrained( "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16" ) # enable to reduce GPU VRAM usage (~30%) # pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16) pipe.to("cuda") config = CompilationConfig.Default() config.enable_xformers = True config.enable_triton = True config.enable_cuda_graph = True pipe = compile(pipe, config) img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" image = load_image(img_url).resize((1024, 1024)) mask_image = load_image(mask_url).resize((1024, 1024)) prompt = "a tiger sitting on a park bench" generator = torch.Generator(device="cuda").manual_seed(0) image = pipe( prompt=prompt, image=image, mask_image=mask_image, guidance_scale=8.0, num_inference_steps=20, # steps between 15 and 30 work well strength=0.99, # make sure to use `strength` below 1.0 generator=generator, ).images[0] ``` ## Github repository https://github.com/reznya22/stable-fast-xl