GuanjieChen commited on
Commit
d064a8f
·
verified ·
1 Parent(s): d2d09b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -21
README.md CHANGED
@@ -1,11 +1,20 @@
 
 
 
 
 
 
 
 
 
 
1
  # Accelerating Vision Diffusion Transformers with Skip Branches
2
 
3
- ### About
4
- This repository contains the official PyTorch implementation of the paper: **[Accelerating Vision Diffusion Transformers with Skip Branches](https://arxiv.org/abs/2411.17616)**. In this work, we enhance standard DiT models by introducing **Skip-DiT**, which incorporates skip branches to improve feature smoothness. We also propose **Skip-Cache**, a method that leverages skip branches to cache DiT features across timesteps during inference.The effectiveness of our approach is validated on various DiT backbones for both video and image generation, demonstrating how skip branches preserve generation quality while achieving significant speedup. Experimental results show that **Skip-Cache** provides a $1.5\times$ speedup with minimal computational cost and a $2.2\times$ speedup with only a slight reduction in quantitative metrics. All the codes and checkpoints are publicly available at [huggingface](https://huggingface.co/GuanjieChen/Skip-DiT/tree/main) and [github](https://github.com/OpenSparseLLMs/Skip-DiT.git). More visualizations can be found [here](#visualization).
5
 
6
  ### Pipeline
7
  ![pipeline](visuals/pipeline.jpg)
8
- Illustration of Skip-DiT and Skip-Cache for DiT visual generation caching. (a) The vanilla DiT block for image and video generation. (b) Skip-DiT modifies the vanilla DiT model using skip branches to connect shallow and deep DiT blocks. (c) Given a Skip-DiT with $L$ layers, during inference, at the $t-1$ step, the first layer output ${x'}^{t-1}\_{0}$ and cached $L-1$ layer output ${x'}^t_{L-1}$ are forwarded through the skip branches to the final DiT block to generate the denoising output, without executing DiT blocks 2 to $L-1$.
9
 
10
  ### Feature Smoothness
11
  ![feature](visuals/feature.jpg)
@@ -24,23 +33,11 @@ Feature smoothness analysis of DiT in the class-to-video generation task using D
24
  Pretrained text-to-image Model of [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT) can be found in [Huggingface](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2/tree/main/t2i/model) and [Tencent-cloud](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model-v1_2.zip).
25
 
26
  ### Demo
27
- <div align="center">
28
- <img src="visuals/video-demo.gif" width="90%" ></img>
29
- <br>
30
- <em>
31
- (Results of Latte with skip-branches on text-to-video and class-to-video tasks. Left: text-to-video with 1.7x and 2.0x speedup. Right: class-to-video with 2.2x and 2.5x speedup. Latency is measured on one A100.)
32
- </em>
33
- </div>
34
- <br>
35
-
36
- <div align="center">
37
- <img src="visuals/image-demo.jpg" width="100%" ></img>
38
- <br>
39
- <em>
40
- (Results of HunYuan-DiT with skip-branches on text-to-image task. Latency is measured on one A100.)
41
- </em>
42
- </div>
43
- <br>
44
 
45
  ### Acknowledgement
46
  Skip-DiT has been greatly inspired by the following amazing works and teams: [DeepCache](https://arxiv.org/abs/2312.00858), [Latte](https://github.com/Vchitect/Latte), [DiT](https://github.com/facebookresearch/DiT), and [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT), we thank all the contributors for open-sourcing.
@@ -57,4 +54,4 @@ The code and model weights are licensed under [LICENSE](./class-to-image/LICENSE
57
  #### Text-to-image
58
  ![text-to-image visualizations](visuals/case_t2i.jpg)
59
  #### Class-to-image
60
- ![class-to-image visualizations](visuals/case_c2i.jpg)
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - maxin-cn/Latte-1
8
+ - facebook/DiT-XL-2-256
9
+ - Tencent-Hunyuan/HunyuanDiT
10
+ ---
11
  # Accelerating Vision Diffusion Transformers with Skip Branches
12
 
13
+ This repository contains all the checkpoints of models the paper: **[Accelerating Vision Diffusion Transformers with Skip Branches](https://arxiv.org/abs/2411.17616)**. In this work, we enhance standard DiT models by introducing **Skip-DiT**, which incorporates skip branches to improve feature smoothness. We also propose **Skip-Cache**, a method that leverages skip branches to cache DiT features across timesteps during inference.The effectiveness of our approach is validated on various DiT backbones for both video and image generation, demonstrating how skip branches preserve generation quality while achieving significant speedup. Experimental results show that **Skip-Cache** provides a1.5x speedup with minimal computational cost and a 2.2x speedup with only a slight reduction in quantitative metrics. All the codes and checkpoints are publicly available at [huggingface](https://huggingface.co/GuanjieChen/Skip-DiT/tree/main) and [github](https://github.com/OpenSparseLLMs/Skip-DiT.git). More visualizations can be found [here](#visualization).
 
14
 
15
  ### Pipeline
16
  ![pipeline](visuals/pipeline.jpg)
17
+ Illustration of Skip-DiT and Skip-Cache for DiT visual generation caching. (a) The vanilla DiT block for image and video generation. (b) Skip-DiT modifies the vanilla DiT model using skip branches to connect shallow and deep DiT blocks. (c) Pipeline of Skip-Cache.
18
 
19
  ### Feature Smoothness
20
  ![feature](visuals/feature.jpg)
 
33
  Pretrained text-to-image Model of [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT) can be found in [Huggingface](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2/tree/main/t2i/model) and [Tencent-cloud](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model-v1_2.zip).
34
 
35
  ### Demo
36
+ ![demo1](visuals/video-demo.gif)
37
+ (Results of Latte with skip-branches on text-to-video and class-to-video tasks. Left: text-to-video with 1.7x and 2.0x speedup. Right: class-to-video with 2.2x and 2.5x speedup. Latency is measured on one A100.)
38
+
39
+ ![demo2](visuals/image-demo.jpg)
40
+ (Results of HunYuan-DiT with skip-branches on text-to-image task. Latency is measured on one A100.)
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ### Acknowledgement
43
  Skip-DiT has been greatly inspired by the following amazing works and teams: [DeepCache](https://arxiv.org/abs/2312.00858), [Latte](https://github.com/Vchitect/Latte), [DiT](https://github.com/facebookresearch/DiT), and [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT), we thank all the contributors for open-sourcing.
 
54
  #### Text-to-image
55
  ![text-to-image visualizations](visuals/case_t2i.jpg)
56
  #### Class-to-image
57
+ ![class-to-image visualizations](visuals/case_c2i.jpg)