GuanjieChen
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Accelerating Vision Diffusion Transformers with Skip Branches
|
2 |
|
3 |
-
|
4 |
-
This repository contains the official PyTorch implementation of the paper: **[Accelerating Vision Diffusion Transformers with Skip Branches](https://arxiv.org/abs/2411.17616)**. In this work, we enhance standard DiT models by introducing **Skip-DiT**, which incorporates skip branches to improve feature smoothness. We also propose **Skip-Cache**, a method that leverages skip branches to cache DiT features across timesteps during inference.The effectiveness of our approach is validated on various DiT backbones for both video and image generation, demonstrating how skip branches preserve generation quality while achieving significant speedup. Experimental results show that **Skip-Cache** provides a $1.5\times$ speedup with minimal computational cost and a $2.2\times$ speedup with only a slight reduction in quantitative metrics. All the codes and checkpoints are publicly available at [huggingface](https://huggingface.co/GuanjieChen/Skip-DiT/tree/main) and [github](https://github.com/OpenSparseLLMs/Skip-DiT.git). More visualizations can be found [here](#visualization).
|
5 |
|
6 |
### Pipeline
|
7 |
![pipeline](visuals/pipeline.jpg)
|
8 |
-
Illustration of Skip-DiT and Skip-Cache for DiT visual generation caching. (a) The vanilla DiT block for image and video generation. (b) Skip-DiT modifies the vanilla DiT model using skip branches to connect shallow and deep DiT blocks. (c)
|
9 |
|
10 |
### Feature Smoothness
|
11 |
![feature](visuals/feature.jpg)
|
@@ -24,23 +33,11 @@ Feature smoothness analysis of DiT in the class-to-video generation task using D
|
|
24 |
Pretrained text-to-image Model of [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT) can be found in [Huggingface](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2/tree/main/t2i/model) and [Tencent-cloud](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model-v1_2.zip).
|
25 |
|
26 |
### Demo
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
</em>
|
33 |
-
</div>
|
34 |
-
<br>
|
35 |
-
|
36 |
-
<div align="center">
|
37 |
-
<img src="visuals/image-demo.jpg" width="100%" ></img>
|
38 |
-
<br>
|
39 |
-
<em>
|
40 |
-
(Results of HunYuan-DiT with skip-branches on text-to-image task. Latency is measured on one A100.)
|
41 |
-
</em>
|
42 |
-
</div>
|
43 |
-
<br>
|
44 |
|
45 |
### Acknowledgement
|
46 |
Skip-DiT has been greatly inspired by the following amazing works and teams: [DeepCache](https://arxiv.org/abs/2312.00858), [Latte](https://github.com/Vchitect/Latte), [DiT](https://github.com/facebookresearch/DiT), and [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT), we thank all the contributors for open-sourcing.
|
@@ -57,4 +54,4 @@ The code and model weights are licensed under [LICENSE](./class-to-image/LICENSE
|
|
57 |
#### Text-to-image
|
58 |
![text-to-image visualizations](visuals/case_t2i.jpg)
|
59 |
#### Class-to-image
|
60 |
-
![class-to-image visualizations](visuals/case_c2i.jpg)
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
base_model:
|
7 |
+
- maxin-cn/Latte-1
|
8 |
+
- facebook/DiT-XL-2-256
|
9 |
+
- Tencent-Hunyuan/HunyuanDiT
|
10 |
+
---
|
11 |
# Accelerating Vision Diffusion Transformers with Skip Branches
|
12 |
|
13 |
+
This repository contains all the checkpoints of models the paper: **[Accelerating Vision Diffusion Transformers with Skip Branches](https://arxiv.org/abs/2411.17616)**. In this work, we enhance standard DiT models by introducing **Skip-DiT**, which incorporates skip branches to improve feature smoothness. We also propose **Skip-Cache**, a method that leverages skip branches to cache DiT features across timesteps during inference.The effectiveness of our approach is validated on various DiT backbones for both video and image generation, demonstrating how skip branches preserve generation quality while achieving significant speedup. Experimental results show that **Skip-Cache** provides a1.5x speedup with minimal computational cost and a 2.2x speedup with only a slight reduction in quantitative metrics. All the codes and checkpoints are publicly available at [huggingface](https://huggingface.co/GuanjieChen/Skip-DiT/tree/main) and [github](https://github.com/OpenSparseLLMs/Skip-DiT.git). More visualizations can be found [here](#visualization).
|
|
|
14 |
|
15 |
### Pipeline
|
16 |
![pipeline](visuals/pipeline.jpg)
|
17 |
+
Illustration of Skip-DiT and Skip-Cache for DiT visual generation caching. (a) The vanilla DiT block for image and video generation. (b) Skip-DiT modifies the vanilla DiT model using skip branches to connect shallow and deep DiT blocks. (c) Pipeline of Skip-Cache.
|
18 |
|
19 |
### Feature Smoothness
|
20 |
![feature](visuals/feature.jpg)
|
|
|
33 |
Pretrained text-to-image Model of [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT) can be found in [Huggingface](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2/tree/main/t2i/model) and [Tencent-cloud](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model-v1_2.zip).
|
34 |
|
35 |
### Demo
|
36 |
+
![demo1](visuals/video-demo.gif)
|
37 |
+
(Results of Latte with skip-branches on text-to-video and class-to-video tasks. Left: text-to-video with 1.7x and 2.0x speedup. Right: class-to-video with 2.2x and 2.5x speedup. Latency is measured on one A100.)
|
38 |
+
|
39 |
+
![demo2](visuals/image-demo.jpg)
|
40 |
+
(Results of HunYuan-DiT with skip-branches on text-to-image task. Latency is measured on one A100.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
### Acknowledgement
|
43 |
Skip-DiT has been greatly inspired by the following amazing works and teams: [DeepCache](https://arxiv.org/abs/2312.00858), [Latte](https://github.com/Vchitect/Latte), [DiT](https://github.com/facebookresearch/DiT), and [HunYuan-DiT](https://github.com/Tencent/HunyuanDiT), we thank all the contributors for open-sourcing.
|
|
|
54 |
#### Text-to-image
|
55 |
![text-to-image visualizations](visuals/case_t2i.jpg)
|
56 |
#### Class-to-image
|
57 |
+
![class-to-image visualizations](visuals/case_c2i.jpg)
|