File size: 4,748 Bytes
6219d37 722c0cd a08ead7 722c0cd 672404c 722c0cd 4ec2cb4 722c0cd c1ec7db 2190dab c1ec7db 59b6b28 722c0cd ba93d91 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: apache-2.0
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
tags:
- text-generation-inference
- stable-diffusion
- text-to-image
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
---
# Stable-fast-xl
Stable-fast is an ultra lightweight inference optimization framework for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques.
this repository contains a compact installation of the stable-fast compiler https://github.com/chengzeyi/stable-fast and its inference with the stable-diffusion-xl-base-1.0
Inference with [stable-diffusion-xl-base-1.0)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1)


# Inference SDXL model 30%+ faster!!!
## Differences With Other Acceleration Libraries
#### Fast:
stable-fast is specialy optimized for HuggingFace Diffusers. It achieves a high performance across many libraries. And it provides a very fast compilation speed within only a few seconds. It is significantly faster than **torch.compile**, **TensorRT** and **AITemplate** in compilation time.
#### Minimal:
stable-fast works as a plugin framework for **PyTorch**. It utilizes existing PyTorch functionality and infrastructures and is compatible with other acceleration techniques, as well as popular fine-tuning techniques and deployment solutions.
# How to use
### Install dependencies
```bash
pip install diffusers transformers safetensors accelerate sentencepiece
```
### Download repository and run script for stable-fast installation
```bash
git clone https://huggingface.co/artemtumch/stable-fast-xl
cd stable-fast-xl
```
open **install_stable-fast.sh** file and change cp311 for your python version in this line
pip install -q https://github.com/chengzeyi/stable-fast/releases/download/v0.0.15/stable_fast-0.0.15+torch210cu118-cp311-cp311-manylinux2014_x86_64.whl
where **cp311** -> for **python 3.11** **|** **cp38** -> for **python3.8**
then run script
```bash
sh install_stable-fast.sh
```
## Generate image
```py
from diffusers import DiffusionPipeline
import torch
from sfast.compilers.stable_diffusion_pipeline_compiler import (
compile, CompilationConfig
)
import xformers
import triton
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16"
)
# enable to reduce GPU VRAM usage (~30%)
# pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
pipe.to("cuda")
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
config = CompilationConfig.Default()
config.enable_xformers = True
config.enable_triton = True
config.enable_cuda_graph = True
pipe = compile(pipe, config)
prompt = "An astronaut riding a green horse"
images = pipe(prompt=prompt).images[0]
```
## Inpainting
```py
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
import torch
from sfast.compilers.stable_diffusion_pipeline_compiler import (
compile, CompilationConfig
)
import xformers
import triton
pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
"diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
torch_dtype=torch.float16,
variant="fp16"
)
# enable to reduce GPU VRAM usage (~30%)
# pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
pipe.to("cuda")
config = CompilationConfig.Default()
config.enable_xformers = True
config.enable_triton = True
config.enable_cuda_graph = True
pipe = compile(pipe, config)
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))
prompt = "a tiger sitting on a park bench"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
prompt=prompt,
image=image,
mask_image=mask_image,
guidance_scale=8.0,
num_inference_steps=20, # steps between 15 and 30 work well
strength=0.99, # make sure to use `strength` below 1.0
generator=generator,
).images[0]
```
## Github repository https://github.com/reznya22/stable-fast-xl |