Safetensors

lyraDiff: An Out-of-box Acceleration Engine for Diffusion and DiT Models

Sa Xiao*, Yibo Lu*, Kangjian Wu*, Bin Wu, Haoxiong Su, Mian Peng, Qiwen Mao, Wenjiang Zhou
(*co-first author), (†Corresponding Author, [email protected])
Lyra Lab, Tencent Music Entertainment

[github]

Introduction

🌈lyraDiff is currently the Fastest Diffusion Acceleration Engine that doesn't need recompilation with dynamic input shapes.

The core features include:

  • 🚀 State-of-the-art Inference Speed: lyraDiff utilizes multiple techniques to achieve up to 2x speedup of the model inference, including Quantization, Fused GEMM Kernels, Flash Attention, and NHWC & Fused GroupNorm.
  • 🔥 Memory Efficiency: lyraDiff utilizes buffer-based DRAM reuse strategy and multiple types of quantizations (FP8/INT8/INT4) to save 10-40% of DRAM usage.
  • 🔥 Extensive Model Support: lyraDiff supports a wide range of Generative/SR models such as SD1.5, SDXL, FLUX, S3Diff, SUPIR, etc., and those most commonly used plugins such as LoRA, ControlNet and Ip-Adapter.
  • 🔥 Zero Compilation Deployment: Unlike TensorRT or AITemplate, which takes minutes to compile, lyraDiff eliminates runtime recompilation overhead even with model inputs of dynamic shapes.
  • 🔥 Image Gen Consistency: The outputs of lyraDiff are aligned with the ones of HF diffusers at the pixel level, even under LoRA switch in quantization mode.
  • 🚀 Fast Plugin Hot-swap: lyraDiff provides Super Fast Model Hot-swap for ControlNet and LoRA which can hugely benefit a real-time image gen service.

image/jpeg

lyraDiff-IP-Adapters is converted from the standard IP-Adapter weights using this script to be compatiable with lyraDiff, and contains both SD1.5 and SDXL version of converted IP-Adapter

Usage

We provide a reference implementation of lyraDiff version of SD1.5/SDXL, as well as sampling code, in a dedicated github repository.

Example

We provide minimal script for running SDXL models + IP-Adapter with lyraDiff as follows:

import torch
import time
import sys, os
from diffusers import StableDiffusionXLPipeline
from lyradiff.lyradiff_model.module.lyradiff_ip_adapter import LyraIPAdapter
from transformers import CLIPTextModel, CLIPTokenizer, CLIPTextModelWithProjection
from lyradiff.lyradiff_model.lyradiff_unet_model import LyraDiffUNet2DConditionModel
from lyradiff.lyradiff_model.lyradiff_vae_model import LyraDiffVaeModel
from diffusers import EulerAncestralDiscreteScheduler
from PIL import Image
from diffusers.utils import load_image
import GPUtil

model_path = "/path/to/sdxl/model/"
vae_model_path = "/path/to/sdxl/sdxl-vae-fp16-fix"

text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder").to(torch.float16).to(torch.device("cuda"))
text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(model_path, subfolder="text_encoder_2").to(torch.float16).to(torch.device("cuda"))
tokenizer = CLIPTokenizer.from_pretrained(model_path, subfolder="tokenizer")
tokenizer_2 = CLIPTokenizer.from_pretrained( model_path, subfolder="tokenizer_2")

unet = LyraDiffUNet2DConditionModel(is_sdxl=True)
vae = LyraDiffVaeModel(scaling_factor=0.13025, is_upcast=False)

unet.load_from_diffusers_model(os.path.join(model_path, "unet"))
vae.load_from_diffusers_model(vae_model_path)

scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_path, subfolder="scheduler", timestep_spacing="linspace")

pipe = StableDiffusionXLPipeline(
    vae=vae,
    unet=unet,
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    tokenizer=tokenizer,
    tokenizer_2=tokenizer_2,
    scheduler=scheduler
)

ip_ckpt = "/path/to/sdxl/ip_ckpt/ip-adapter-plus_sdxl_vit-h.bin"
image_encoder_path = "/path/to/sdxl/ip_ckpt/image_encoder"

# Create LyraIPAdapter
ip_adapter = LyraIPAdapter(unet_model=unet.model, sdxl=True, device=torch.device("cuda"), ip_ckpt=ip_ckpt, ip_plus=True, image_encoder_path=image_encoder_path, num_ip_tokens=16, ip_projection_dim=1024)

# load ip_adapter image
ip_image = load_image("https://cdn-uploads.huggingface.co/production/uploads/6461b412846a6c8c8305319d/8U6yNHTPLaOC3gIWJZWGL.png")
ip_scale = 0.5

# get ip image embedding and pass it to the pipeline
ip_image_embedding = [ip_adapter.get_image_embeds_lyradiff(ip_image)['ip_hidden_states']]
# unet set ip adapter scale in unet model obj, since we cannot set ip_adapter_scale through diffusers pipeline
unet.set_ip_adapter_scale(ip_scale)

for i in range(3):
    generator = torch.Generator("cuda").manual_seed(123)
    start = time.perf_counter()
    images = pipe(prompt="a beautiful girl, cartoon style",
                   height=1024,
                   width=1024,
                   num_inference_steps=20,
                   num_images_per_prompt=1,
                   guidance_scale=7.5,
                   negative_prompt="NSFW",
                   generator=torch.Generator("cuda").manual_seed(123),
                   ip_adapter_image_embeds=ip_image_embedding
                   )[0]
    images[0].save(f"sdxl_ip_{i}.png")

Citation

@Misc{lyraDiff_2025,
  author =       {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Sa Xiao, Qiwen Mao, Mian Peng, Bin Wu, Wenjiang Zhou},
  title =        {lyraDiff: Accelerating Diffusion Models with best flexibility},
  howpublished = {\url{https://github.com/TMElyralab/lyraDiff}},
  year =         {2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.