Add library name and pipeline tag

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +53 -48
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  # Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression <br><sub>Official PyTorch Implementation</sub>
2
 
3
  [![arXiv](https://img.shields.io/badge/arXiv%20paper-2506.09482-b31b1b.svg)](https://arxiv.org/pdf/2506.09482)&nbsp;
@@ -22,10 +27,10 @@ This is a PyTorch/GPU implementation of the paper [Marrying Autoregressive Trans
22
 
23
  This repo contains:
24
 
25
- * 🪐 A simple PyTorch implementation of [TransDiff Model](models/transdiff.py) and [TransDiff Model with MRAR](models/transdiff_mrar.py)
26
- * ⚡️ Pre-trained class-conditional TransDiff models trained on ImageNet 256x256 and 512x512
27
- * 💥 A self-contained [notebook](demo.ipynb) for running various pre-trained TransDiff models
28
- * 🛸 An TransDiff [training and evaluation script](main.py) using PyTorch DDP
29
 
30
  ## Preparation
31
 
@@ -71,10 +76,10 @@ Given that our data augmentation consists of simple center cropping and random f
71
  the VAE latents can be pre-computed and saved to `CACHED_PATH` to save computations during TransDiff training:
72
 
73
  ```
74
- torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
75
- main_cache.py \
76
- --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 \
77
- --batch_size 128 \
78
  --data_path ${IMAGENET_PATH} --cached_path ${CACHED_PATH}
79
  ```
80
 
@@ -86,13 +91,13 @@ Run our interactive visualization [demo](demo.ipynb).
86
  ### Training
87
  Script for the TransDiff-L 1StepAR setting (Pretrain TransDiff-L with a width of 1024 channels, 800 epochs):
88
  ```
89
- torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
90
- main.py \
91
- --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \
92
- --model transdiff_large --diffloss_w 1024 \
93
- --diffusion_batch_mul 4 \
94
- --epochs 800 --warmup_epochs 100 --blr 1.0e-4 --batch_size 32 \
95
- --output_dir ${OUTPUT_DIR} --resume ${OUTPUT_DIR} \
96
  --data_path ${IMAGENET_PATH}
97
  ```
98
  - Training time is ~115h on 64 A100 GPUs with `--batch_size 32`.
@@ -103,25 +108,25 @@ main.py \
103
 
104
  Script for the TransDiff-L MRAR setting (Finetune TransDiff-L MRAR with a width of 1024 channels, 40 epochs):
105
  ```
106
- torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
107
- main.py \
108
- --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \
109
- --model transdiff_large --diffloss_w 1024 --mrar --bf16 \
110
- --diffusion_batch_mul 2 \
111
- --epochs 40 --warmup_epochs 10 --lr 5.0e-5 --batch_size 16 --gradient_accumulation_steps 2 \
112
- --output_dir ${OUTPUT_DIR} --resume ${Transdiff-L_1StepAR_DIR} \
113
  --data_path ${IMAGENET_PATH}
114
  ```
115
  Script for the TransDiff-L 512x512 setting (Finetune TransDiff-L 512x512 with a width of 1024 channels, 150 epochs):
116
  ```
117
- torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
118
- main.py \
119
- --img_size 512 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \
120
- --model transdiff_large --diffloss_w 1024 --ema_rate 0.999 --bf16 \
121
- --diffusion_batch_mul 4 \
122
- --epochs 150 --warmup_epochs 10 --lr 1.0e-4 --batch_size 16 --gradient_accumulation_steps 2 \
123
- --only_train_diff \
124
- --output_dir ${OUTPUT_DIR} --resume ${Transdiff-L_1StepAR_DIR} \
125
  --data_path ${IMAGENET_PATH}
126
  ```
127
 
@@ -129,34 +134,34 @@ main.py \
129
 
130
  Evaluate TransDiff-L 1StepAR with classifier-free guidance:
131
  ```
132
- torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
133
- main.py \
134
- --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \
135
- --model transdiff_large --diffloss_w 1024 \
136
- --output_dir ${OUTPUT_DIR} --resume ckpt/transdiff_l/ \
137
- --evaluate --eval_bsz 256 --num_images 50000 \
138
  --cfg 1.3 --scale_0 0.89 --scale_1 0.95
139
  ```
140
 
141
  Evaluate TransDiff-L MRAR with classifier-free guidance:
142
  ```
143
- torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
144
- main.py \
145
- --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \
146
- --model transdiff_large --diffloss_w 1024 \
147
- --output_dir ${OUTPUT_DIR} --resume ckpt/transdiff_l_mrar/ \
148
- --evaluate --eval_bsz 256 --num_images 50000 \
149
  --cfg 1.3 --scale_0 0.91 --scale_1 0.93
150
  ```
151
 
152
  Evaluate TransDiff-L 512x512 with classifier-free guidance:
153
  ```
154
- torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
155
- main.py \
156
- --img_size 512 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \
157
- --model transdiff_large --diffloss_w 1024 \
158
- --output_dir ${OUTPUT_DIR} --resume ckpt/transdiff_l_512/ \
159
- --evaluate --eval_bsz 64 --num_images 50000 \
160
  --cfg 1.3 --scale_0 0.87 --scale_1 0.87
161
  ```
162
 
 
1
+ ---
2
+ library_name: diffusers
3
+ pipeline_tag: unconditional-image-generation
4
+ ---
5
+
6
  # Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression <br><sub>Official PyTorch Implementation</sub>
7
 
8
  [![arXiv](https://img.shields.io/badge/arXiv%20paper-2506.09482-b31b1b.svg)](https://arxiv.org/pdf/2506.09482)&nbsp;
 
27
 
28
  This repo contains:
29
 
30
+ * \ud83e\ude90 A simple PyTorch implementation of [TransDiff Model](models/transdiff.py) and [TransDiff Model with MRAR](models/transdiff_mrar.py)
31
+ * \u26a1\ufe0f Pre-trained class-conditional TransDiff models trained on ImageNet 256x256 and 512x512
32
+ * \ud83d\udca5 A self-contained [notebook](demo.ipynb) for running various pre-trained TransDiff models
33
+ * \ud83d\udef8 An TransDiff [training and evaluation script](main.py) using PyTorch DDP
34
 
35
  ## Preparation
36
 
 
76
  the VAE latents can be pre-computed and saved to `CACHED_PATH` to save computations during TransDiff training:
77
 
78
  ```
79
+ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \\
80
+ main_cache.py \\
81
+ --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 \\
82
+ --batch_size 128 \\
83
  --data_path ${IMAGENET_PATH} --cached_path ${CACHED_PATH}
84
  ```
85
 
 
91
  ### Training
92
  Script for the TransDiff-L 1StepAR setting (Pretrain TransDiff-L with a width of 1024 channels, 800 epochs):
93
  ```
94
+ torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \\
95
+ main.py \\
96
+ --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \\
97
+ --model transdiff_large --diffloss_w 1024 \\
98
+ --diffusion_batch_mul 4 \\
99
+ --epochs 800 --warmup_epochs 100 --blr 1.0e-4 --batch_size 32 \\
100
+ --output_dir ${OUTPUT_DIR} --resume ${OUTPUT_DIR} \\
101
  --data_path ${IMAGENET_PATH}
102
  ```
103
  - Training time is ~115h on 64 A100 GPUs with `--batch_size 32`.
 
108
 
109
  Script for the TransDiff-L MRAR setting (Finetune TransDiff-L MRAR with a width of 1024 channels, 40 epochs):
110
  ```
111
+ torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \\
112
+ main.py \\
113
+ --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \\
114
+ --model transdiff_large --diffloss_w 1024 --mrar --bf16 \\
115
+ --diffusion_batch_mul 2 \\
116
+ --epochs 40 --warmup_epochs 10 --lr 5.0e-5 --batch_size 16 --gradient_accumulation_steps 2 \\
117
+ --output_dir ${OUTPUT_DIR} --resume ${Transdiff-L_1StepAR_DIR} \\
118
  --data_path ${IMAGENET_PATH}
119
  ```
120
  Script for the TransDiff-L 512x512 setting (Finetune TransDiff-L 512x512 with a width of 1024 channels, 150 epochs):
121
  ```
122
+ torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \\
123
+ main.py \\
124
+ --img_size 512 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \\
125
+ --model transdiff_large --diffloss_w 1024 --ema_rate 0.999 --bf16 \\
126
+ --diffusion_batch_mul 4 \\
127
+ --epochs 150 --warmup_epochs 10 --lr 1.0e-4 --batch_size 16 --gradient_accumulation_steps 2 \\
128
+ --only_train_diff \\
129
+ --output_dir ${OUTPUT_DIR} --resume ${Transdiff-L_1StepAR_DIR} \\
130
  --data_path ${IMAGENET_PATH}
131
  ```
132
 
 
134
 
135
  Evaluate TransDiff-L 1StepAR with classifier-free guidance:
136
  ```
137
+ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \\
138
+ main.py \\
139
+ --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \\
140
+ --model transdiff_large --diffloss_w 1024 \\
141
+ --output_dir ${OUTPUT_DIR} --resume ckpt/transdiff_l/ \\
142
+ --evaluate --eval_bsz 256 --num_images 50000 \\
143
  --cfg 1.3 --scale_0 0.89 --scale_1 0.95
144
  ```
145
 
146
  Evaluate TransDiff-L MRAR with classifier-free guidance:
147
  ```
148
+ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \\
149
+ main.py \\
150
+ --img_size 256 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \\
151
+ --model transdiff_large --diffloss_w 1024 \\
152
+ --output_dir ${OUTPUT_DIR} --resume ckpt/transdiff_l_mrar/ \\
153
+ --evaluate --eval_bsz 256 --num_images 50000 \\
154
  --cfg 1.3 --scale_0 0.91 --scale_1 0.93
155
  ```
156
 
157
  Evaluate TransDiff-L 512x512 with classifier-free guidance:
158
  ```
159
+ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \\
160
+ main.py \\
161
+ --img_size 512 --vae_path ckpt/vae/kl16.ckpt --vae_embed_dim 16 --patch_size 1 \\
162
+ --model transdiff_large --diffloss_w 1024 \\
163
+ --output_dir ${OUTPUT_DIR} --resume ckpt/transdiff_l_512/ \\
164
+ --evaluate --eval_bsz 64 --num_images 50000 \\
165
  --cfg 1.3 --scale_0 0.87 --scale_1 0.87
166
  ```
167