wangsssssss commited on
Commit
1cab38e
Β·
verified Β·
1 Parent(s): 1008420

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -3
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ ## [NeurIPS24] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
6
+
7
+
8
+ ![caps](./figs/viscaption5.png)
9
+
10
+
11
+ ### [NEWS] [9.26] πŸ’πŸ’ Our FlowDCN is accepted by NeurIPS 2024! πŸ’πŸ’
12
+ ### [NEWS] [11.22] 🍺 Our FlowDCN models and code are now available in the official repo!
13
+
14
+ ## Pretrained Models
15
+ Our Models consistently achieve state-of-the-art results on the sFID metrics compared to SiT/DiT.
16
+
17
+ ### Metrics
18
+ Our Models consistently has fewer parameters and GFLOPS compared to Transformer counterparts.
19
+ Our code also support LogNorm and VAR(Various Aspect Ratio Training)
20
+
21
+ | Model-iters | Resolution | Solver | NFE-CFG | FID | sFID | Params |Link|
22
+ |:------------------:|:----------:|:---------------:|:-------:|:----:|:----:|:------:|:--:|
23
+ | FlowDCN-S-400k | 256x256 | EulerSDE-250 | 250x2 | 54.6 | 8.8 | 30.3M |
24
+ | FlowDCN-B-400k | 256x256 | EulerSDE-250 | 250x2 | 28.5 | 6.09 | 120M |
25
+ | VAR-FlowDCN-B-400k | 256x256 | EulerSDE-250 | 250x2 | 23.6 | 7.72 | 120M |
26
+ | FlowDCN-L-400k | 256x256 | EulerSDE-250 | 250x2 | 13.8 | 4.69 | 421M |
27
+ | FlowDCN-XL-2M | 256x256 | EulerODE-250 | 250x2 | 2.01 | 4.33 | 618M |
28
+ | FlowDCN-XL-2M | 256x256 | EulerSDE-250 | 250x2 | 2.00 | 4.37 | 618M |
29
+ | FlowDCN-XL-2M | 256x256 | NeuralSolver-10 | 10x2 | 2.35 | 5.07 | 618M |
30
+ | FlowDCN-XL-100k | 512x512 | EulerODE-50 | 50x2 | 2.76 | 5.29 | 618M |
31
+ | FlowDCN-XL-100k | 512x512 | EulerSDE-250 | 250x2 | 2.44 | 4.53 | 618M |
32
+ | FlowDCN-XL-100k | 512x512 | NeuralSolver-10 | 10x2 | 2.77 | 4.68 | 618M |
33
+
34
+ ### Visualizations
35
+
36
+ ![caps](./figs/vis_ode.png)
37
+
38
+ ### Various Resolution Extension
39
+ | Models | 256x256 FID | sFID | IS | 320x320 FID | sFID | IS | 224x448 FID | sFID | IS | 160x480 FID | sFID | IS |
40
+ |------------------|-------|-------|-------------|-------|--------|-------------|-------|--------|-------------|-------|--------|-------|
41
+ | DiT-B | 44.83 | 8.49 | 32.05 | 95.47 | 108.68 | 18.38 | 109.1 | 110.71 | 14.00 | 143.8 | 122.81 | 8.93 |
42
+ | with EI | 44.83 | 8.49 | 32.05 | 81.48 | 62.25 | 20.97 | 133.2 | 72.53 | 11.11 | 160.4 | 93.91 | 7.30 |
43
+ | with PI | 44.83 | 8.49 | 32.05 | 72.47 | 54.02 | 24.15 | 133.4 | 70.29 | 11.73 | 156.5 | 93.80 | 7.80 |
44
+ | FiT-B (+VAR) | 36.36 | 11.08 | 40.69 | 61.35 | 30.71 | 31.01 | 44.67 | 24.09 | 37.1 | 56.81 | 22.07 | 25.25 |
45
+ | with VisionYaRN | 36.36 | 11.08 | 40.69 | 44.76 | 38.04 | 44.70 | 41.92 | 42.79 | 45.87 | 62.84 | 44.82 | 27.84 |
46
+ | with VisionNTK | 36.36 | 11.08 | 40.69 | 57.31 | 31.31 | 33.97 | 43.84 | 26.25 | 39.22 | 56.76 | 24.18 | 26.40 |
47
+ | FlowDCN-B | 28.5 | 6.09 | 51 | 34.4 | 27.2 | 52.2 | 71.7 | 62.0 | 23.7 | 211 | 111 | 5.83 |
48
+ | FlowDCN-B (+VAR) | 23.6 | 7.72 | 62.8 | 29.1 | 15.8 | 69.5 | 31.4 | 17.0 | 62.4 | 44.7 | 17.8 | 35.8 |
49
+
50
+
51
+ [//]: # ()
52
+ [//]: # (![caps](./figs/var_fid.png))
53
+
54
+ ## Linear-Multi-step Solvers and NeuralSolvers
55
+ We also provide a adams-like linear-multi-step solver for the recitified flow sampling. The related configs are named with `adam2` or `adam4`. The solver code are placed in `./src/diffusion/flow_matching/adam_sampling.py`.
56
+
57
+ Compared to Henu/RK4, the linear-multi-step solver is more stable and faster.
58
+
59
+ During some experiments, we supringly find that the linear-multi-step solver can achieve comparable results even with FlowTurbo.
60
+
61
+ As they are distinct methods, so armed with Adams, we believe FlowTurbo can be more powerful.
62
+
63
+ Also, We provide some magic solvers for the recitified flow sampling. These solvers are highly inspired by linear-multi-steps methods, and consists of just some **Magic Numbers**
64
+ These solvers are really powerful and interesting. We place the related code in `./src/diffusion/flow_matching/ns_sampling.py`.
65
+
66
+ | SiT-XL-R256 | Steps | NFE-CFG | Extra-Paramters | FID | IS | PR | Recall |
67
+ |--|-------|----------|-----------------|------|-------|------|--------|
68
+ | Heun | 8 | 16x2 | 0 | 3.68 | / | / | / |
69
+ | Heun | 11 | 22x2 | 0 | 2.79 | / | / | / |
70
+ | Heun | 15 | 30x2 | 0 | 2.42 | / | / | / |
71
+ | Adam2 | 6 | 6x2 | 0 | 6.35 | 190 | 0.75 | 0.55 |
72
+ | Adam2 | 8 | 8x2 | 0 | 4.16 | 212 | 0.78 | 0.56 |
73
+ | Adam2 | 16 | 16x2 | 0 | 2.42 | 237 | 0.80 | 0.60 |
74
+ | Adam4 | 16 | 16x2 | 0 | 2.27 | 243 | 0.80 | 0.60 |
75
+ | FlowTurbo | 6 | (7+3)x2 | 30408704(29M) | 3.93 | 223.6 | 0.79 | 0.56 |
76
+ | FlowTurbo | 8 | (8+2)x2 | 30408704(29M) | 3.63 | / | / | / |
77
+ | FlowTurbo | 10 | (12+2)x2 | 30408704(29M) | 2.69 | / | / | / |
78
+ | FlowTurbo | 15 | (17+3)x2 | 30408704(29M) | 2.22 | 248 | 0.81 | 0.60 |
79
+ | NeuralSolver | 6 | 6x2 | 21 | 3.57 | 214 | 0.77 | 0.58 |
80
+ | NeuralSolver | 7 | 7x2 | 28 | 2.78 | 229 | 0.79 | 0.60 |
81
+ | NeuralSolver | 8 | 8x2 | 36 | 2.65 | 234 | 0.79 | 0.60 |
82
+ | NeuralSolver | 10 | 10x2 | 55 | 2.40 | 238 | 0.79 | 0.60 |
83
+ | NeuralSolver | 15 | 15x2 | 110 | 2.24 | 244 | 0.80 | 0.60 |
84
+
85
+ ## Citation
86
+ ```bibtex
87
+ @inproceedings{
88
+ wang2024exploring,
89
+ title={Exploring {DCN}-like architecture for fast image generation with arbitrary resolution},
90
+ author={Shuai Wang and Zexian Li and Tianhui Song and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
91
+ booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
92
+ year={2024},
93
+ url={https://openreview.net/forum?id=e57B7BfA2B}
94
+ }
95
+ ```