Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,95 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
## [NeurIPS24] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
|
6 |
+
|
7 |
+
|
8 |
+

|
9 |
+
|
10 |
+
|
11 |
+
### [NEWS] [9.26] ππ Our FlowDCN is accepted by NeurIPS 2024! ππ
|
12 |
+
### [NEWS] [11.22] πΊ Our FlowDCN models and code are now available in the official repo!
|
13 |
+
|
14 |
+
## Pretrained Models
|
15 |
+
Our Models consistently achieve state-of-the-art results on the sFID metrics compared to SiT/DiT.
|
16 |
+
|
17 |
+
### Metrics
|
18 |
+
Our Models consistently has fewer parameters and GFLOPS compared to Transformer counterparts.
|
19 |
+
Our code also support LogNorm and VAR(Various Aspect Ratio Training)
|
20 |
+
|
21 |
+
| Model-iters | Resolution | Solver | NFE-CFG | FID | sFID | Params |Link|
|
22 |
+
|:------------------:|:----------:|:---------------:|:-------:|:----:|:----:|:------:|:--:|
|
23 |
+
| FlowDCN-S-400k | 256x256 | EulerSDE-250 | 250x2 | 54.6 | 8.8 | 30.3M |
|
24 |
+
| FlowDCN-B-400k | 256x256 | EulerSDE-250 | 250x2 | 28.5 | 6.09 | 120M |
|
25 |
+
| VAR-FlowDCN-B-400k | 256x256 | EulerSDE-250 | 250x2 | 23.6 | 7.72 | 120M |
|
26 |
+
| FlowDCN-L-400k | 256x256 | EulerSDE-250 | 250x2 | 13.8 | 4.69 | 421M |
|
27 |
+
| FlowDCN-XL-2M | 256x256 | EulerODE-250 | 250x2 | 2.01 | 4.33 | 618M |
|
28 |
+
| FlowDCN-XL-2M | 256x256 | EulerSDE-250 | 250x2 | 2.00 | 4.37 | 618M |
|
29 |
+
| FlowDCN-XL-2M | 256x256 | NeuralSolver-10 | 10x2 | 2.35 | 5.07 | 618M |
|
30 |
+
| FlowDCN-XL-100k | 512x512 | EulerODE-50 | 50x2 | 2.76 | 5.29 | 618M |
|
31 |
+
| FlowDCN-XL-100k | 512x512 | EulerSDE-250 | 250x2 | 2.44 | 4.53 | 618M |
|
32 |
+
| FlowDCN-XL-100k | 512x512 | NeuralSolver-10 | 10x2 | 2.77 | 4.68 | 618M |
|
33 |
+
|
34 |
+
### Visualizations
|
35 |
+
|
36 |
+

|
37 |
+
|
38 |
+
### Various Resolution Extension
|
39 |
+
| Models | 256x256 FID | sFID | IS | 320x320 FID | sFID | IS | 224x448 FID | sFID | IS | 160x480 FID | sFID | IS |
|
40 |
+
|------------------|-------|-------|-------------|-------|--------|-------------|-------|--------|-------------|-------|--------|-------|
|
41 |
+
| DiT-B | 44.83 | 8.49 | 32.05 | 95.47 | 108.68 | 18.38 | 109.1 | 110.71 | 14.00 | 143.8 | 122.81 | 8.93 |
|
42 |
+
| with EI | 44.83 | 8.49 | 32.05 | 81.48 | 62.25 | 20.97 | 133.2 | 72.53 | 11.11 | 160.4 | 93.91 | 7.30 |
|
43 |
+
| with PI | 44.83 | 8.49 | 32.05 | 72.47 | 54.02 | 24.15 | 133.4 | 70.29 | 11.73 | 156.5 | 93.80 | 7.80 |
|
44 |
+
| FiT-B (+VAR) | 36.36 | 11.08 | 40.69 | 61.35 | 30.71 | 31.01 | 44.67 | 24.09 | 37.1 | 56.81 | 22.07 | 25.25 |
|
45 |
+
| with VisionYaRN | 36.36 | 11.08 | 40.69 | 44.76 | 38.04 | 44.70 | 41.92 | 42.79 | 45.87 | 62.84 | 44.82 | 27.84 |
|
46 |
+
| with VisionNTK | 36.36 | 11.08 | 40.69 | 57.31 | 31.31 | 33.97 | 43.84 | 26.25 | 39.22 | 56.76 | 24.18 | 26.40 |
|
47 |
+
| FlowDCN-B | 28.5 | 6.09 | 51 | 34.4 | 27.2 | 52.2 | 71.7 | 62.0 | 23.7 | 211 | 111 | 5.83 |
|
48 |
+
| FlowDCN-B (+VAR) | 23.6 | 7.72 | 62.8 | 29.1 | 15.8 | 69.5 | 31.4 | 17.0 | 62.4 | 44.7 | 17.8 | 35.8 |
|
49 |
+
|
50 |
+
|
51 |
+
[//]: # ()
|
52 |
+
[//]: # ()
|
53 |
+
|
54 |
+
## Linear-Multi-step Solvers and NeuralSolvers
|
55 |
+
We also provide a adams-like linear-multi-step solver for the recitified flow sampling. The related configs are named with `adam2` or `adam4`. The solver code are placed in `./src/diffusion/flow_matching/adam_sampling.py`.
|
56 |
+
|
57 |
+
Compared to Henu/RK4, the linear-multi-step solver is more stable and faster.
|
58 |
+
|
59 |
+
During some experiments, we supringly find that the linear-multi-step solver can achieve comparable results even with FlowTurbo.
|
60 |
+
|
61 |
+
As they are distinct methods, so armed with Adams, we believe FlowTurbo can be more powerful.
|
62 |
+
|
63 |
+
Also, We provide some magic solvers for the recitified flow sampling. These solvers are highly inspired by linear-multi-steps methods, and consists of just some **Magic Numbers**
|
64 |
+
These solvers are really powerful and interesting. We place the related code in `./src/diffusion/flow_matching/ns_sampling.py`.
|
65 |
+
|
66 |
+
| SiT-XL-R256 | Steps | NFE-CFG | Extra-Paramters | FID | IS | PR | Recall |
|
67 |
+
|--|-------|----------|-----------------|------|-------|------|--------|
|
68 |
+
| Heun | 8 | 16x2 | 0 | 3.68 | / | / | / |
|
69 |
+
| Heun | 11 | 22x2 | 0 | 2.79 | / | / | / |
|
70 |
+
| Heun | 15 | 30x2 | 0 | 2.42 | / | / | / |
|
71 |
+
| Adam2 | 6 | 6x2 | 0 | 6.35 | 190 | 0.75 | 0.55 |
|
72 |
+
| Adam2 | 8 | 8x2 | 0 | 4.16 | 212 | 0.78 | 0.56 |
|
73 |
+
| Adam2 | 16 | 16x2 | 0 | 2.42 | 237 | 0.80 | 0.60 |
|
74 |
+
| Adam4 | 16 | 16x2 | 0 | 2.27 | 243 | 0.80 | 0.60 |
|
75 |
+
| FlowTurbo | 6 | (7+3)x2 | 30408704(29M) | 3.93 | 223.6 | 0.79 | 0.56 |
|
76 |
+
| FlowTurbo | 8 | (8+2)x2 | 30408704(29M) | 3.63 | / | / | / |
|
77 |
+
| FlowTurbo | 10 | (12+2)x2 | 30408704(29M) | 2.69 | / | / | / |
|
78 |
+
| FlowTurbo | 15 | (17+3)x2 | 30408704(29M) | 2.22 | 248 | 0.81 | 0.60 |
|
79 |
+
| NeuralSolver | 6 | 6x2 | 21 | 3.57 | 214 | 0.77 | 0.58 |
|
80 |
+
| NeuralSolver | 7 | 7x2 | 28 | 2.78 | 229 | 0.79 | 0.60 |
|
81 |
+
| NeuralSolver | 8 | 8x2 | 36 | 2.65 | 234 | 0.79 | 0.60 |
|
82 |
+
| NeuralSolver | 10 | 10x2 | 55 | 2.40 | 238 | 0.79 | 0.60 |
|
83 |
+
| NeuralSolver | 15 | 15x2 | 110 | 2.24 | 244 | 0.80 | 0.60 |
|
84 |
+
|
85 |
+
## Citation
|
86 |
+
```bibtex
|
87 |
+
@inproceedings{
|
88 |
+
wang2024exploring,
|
89 |
+
title={Exploring {DCN}-like architecture for fast image generation with arbitrary resolution},
|
90 |
+
author={Shuai Wang and Zexian Li and Tianhui Song and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
|
91 |
+
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
|
92 |
+
year={2024},
|
93 |
+
url={https://openreview.net/forum?id=e57B7BfA2B}
|
94 |
+
}
|
95 |
+
```
|