Single Trajectory Distillation for Accelerating Image and Video Style Transfer

arXiv Hugging Face Models GitHub Project Page

Authors: Sijie Xu1, Runqi Wang1,2, Wei Zhu1, Dejia Song1, Nemo Chen1, Xu Tang1, Yao Hu1
Affiliations: 1Xiaohongshu, 2ShanghaiTech University

πŸ–ΌοΈ Visual Results

Method Overview

Qualitative Comparison

Comparison Visual comparison with LCM, TCD, PCM, and other baselines at NFE=8 (CFG=6)

Metric Analysis

Performance under different CFG values (2-8). Our method (red line) achieves optimal style-content balance.

πŸš€ Quick Start

Inference Demo (Image-to-Image)

# !pip install opencv-python
import torch
from diffusers import StableDiffusionXLImg2ImgPipeline, TCDScheduler
from PIL import Image

device = "cuda"
std_lora_path = "weights/std/std_sdxl_i2i_eta0.75.safetensors"

# Initialize pipeline
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "weights/dreamshaper_XL_v21", 
    torch_dtype=torch.float16, 
    variant="fp16"
).to(device)

# Load STD components
pipe.scheduler = TCDScheduler.from_config(
    pipe.scheduler.config, 
    timestep_spacing='leading', 
    steps_offset=1
)
pipe.load_lora_weights(std_lora_path, adapter_name="std")
pipe.fuse_lora()

# Prepare inputs
prompt = "Stick figure abstract nostalgic style."
n_prompt = "worst face, NSFW, nudity, nipples, (worst quality, low quality:1.4), blurred, low resolution, pixelated, dull colors, overly simplistic, harsh lighting, lack of detail, poorly composed, dark and gloomy atmosphere, (malformed hands:1.4), (poorly drawn hands:1.4), (mutated fingers:1.4), (extra limbs:1.35), (poorly  drawn face:1.4), missing legs, (extra legs:1.4), missing arms, extra arm, ugly, fat, (close shot:1.1), explicit content, sexual content, pornography, adult content, inappropriate, indecent, obscene, vulgar, suggestive, erotic, lewd, provocative, mature content"
src_img = Image.open("doc/imgs/src_img.jpg").resize((960, 1280))
style_img = Image.open("doc/imgs/style_img.png")

# Run inference
image = pipe(
    prompt=prompt, 
    negative_prompt=n_prompt,
    num_inference_steps=11,  # 8 / 0.75 = 11
    guidance_scale=6,
    strength=0.75,
    image=src_img,
    ip_adapter_image=style_img,
).images[0]

image.save("std_output.png")

πŸ“¦ Model Zoo

We provide pretrained models for both image-to-image and video-to-video tasks with different Ξ· values. All models are hosted on Hugging Face.

Image-to-Image Models

Video-to-Video Models

πŸ“š Citation

@article{xu2024single,
  title={Single Trajectory Distillation for Accelerating Image and Video Style Transfer},
  author={Xu, Sijie and Wang, Runqi and Zhu, Wei and Song, Dejia and Chen, Nemo and Tang, Xu and Hu, Yao},
  journal={arXiv preprint arXiv:2412.18945},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for SecondComming/Single-Trajectory-Distillation

Unable to build the model tree, the base model loops to the model itself. Learn more.