|
run_dir: .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16can not find a checkpoint, will train from scratch |
|
Train Epoch #1: 0%| | 0/20799 [00:00<?, ?it/s]
Train Epoch #1: 0%| | 38/20799 [00:10<1:32:04, 3.76it/s, shape=torch.Size([32, 32, 16, 16]), global_step=38, grad_norm=0.245, lr=0.0002, loss=0.468]
Train Epoch #1: 0%| | 84/20799 [00:20<1:21:50, 4.22it/s, shape=torch.Size([32, 32, 16, 16]), global_step=84, grad_norm=0.293, lr=0.0002, loss=0.35]
Train Epoch #1: 1%| | 130/20799 [00:30<1:18:55, 4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=130, grad_norm=0.201, lr=0.0002, loss=0.309]
Train Epoch #1: 1%| | 176/20799 [00:40<1:17:30, 4.43it/s, shape=torch.Size([32, 32, 16, 16]), global_step=176, grad_norm=0.209, lr=0.0002, loss=0.289]
Train Epoch #1: 1%| | 222/20799 [00:50<1:16:40, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=222, grad_norm=0.157, lr=0.0002, loss=0.277]
Train Epoch #1: 1%|â | 268/20799 [01:00<1:16:06, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=268, grad_norm=0.17, lr=0.0002, loss=0.268]
Train Epoch #1: 2%|â | 314/20799 [01:10<1:15:42, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=314, grad_norm=0.18, lr=0.0002, loss=0.26]
Train Epoch #1: 2%|â | 360/20799 [01:21<1:15:22, 4.52it/s, shape=torch.Size([32, 32, 16, 16]), global_step=360, grad_norm=0.187, lr=0.0002, loss=0.255]
Train Epoch #1: 2%|â | 406/20799 [01:31<1:15:07, 4.52it/s, shape=torch.Size([32, 32, 16, 16]), global_step=406, grad_norm=0.13, lr=0.0002, loss=0.251]
Train Epoch #1: 2%|â | 452/20799 [01:41<1:14:55, 4.53it/s, shape=torch.Size([32, 32, 16, 16]), global_step=452, grad_norm=0.187, lr=0.0002, loss=0.248]
Train Epoch #1: 2%|â | 462/20799 [01:44<1:16:20, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=462, grad_norm=0.162, lr=0.0002, loss=0.248]run_dir: .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16can not find a checkpoint, will train from scratch |
|
Train Epoch #1: 0%| | 0/41598 [00:00<?, ?it/s]
Train Epoch #1: 0%| | 38/41598 [00:10<3:04:37, 3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=38, grad_norm=0.338, lr=0.0002, loss=0.469]
Train Epoch #1: 0%| | 84/41598 [00:20<2:44:39, 4.20it/s, shape=torch.Size([32, 32, 16, 16]), global_step=84, grad_norm=0.398, lr=0.0002, loss=0.357]
Train Epoch #1: 0%| | 130/41598 [00:30<2:39:06, 4.34it/s, shape=torch.Size([32, 32, 16, 16]), global_step=130, grad_norm=0.252, lr=0.0002, loss=0.318]
Train Epoch #1: 0%| | 176/41598 [00:40<2:36:28, 4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=176, grad_norm=0.196, lr=0.0002, loss=0.297]
Train Epoch #1: 1%| | 222/41598 [00:50<2:35:05, 4.45it/s, shape=torch.Size([32, 32, 16, 16]), global_step=222, grad_norm=0.197, lr=0.0002, loss=0.285]
Train Epoch #1: 1%| | 268/41598 [01:01<2:34:09, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=268, grad_norm=0.206, lr=0.0002, loss=0.274]
Train Epoch #1: 1%| | 314/41598 [01:11<2:33:28, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=314, grad_norm=0.183, lr=0.0002, loss=0.265]
Train Epoch #1: 1%| | 360/41598 [01:21<2:32:58, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=360, grad_norm=0.188, lr=0.0002, loss=0.26]
Train Epoch #1: 1%| | 406/41598 [01:31<2:32:37, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=406, grad_norm=0.171, lr=0.0002, loss=0.256]
Train Epoch #1: 1%| | 452/41598 [01:42<2:32:57, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=452, grad_norm=0.166, lr=0.0002, loss=0.252]
Train Epoch #1: 1%| | 497/41598 [01:52<2:32:38, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=497, grad_norm=0.159, lr=0.0002, loss=0.25]
Train Epoch #1: 1%|â | 543/41598 [02:02<2:32:12, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=543, grad_norm=0.195, lr=0.0002, loss=0.249]
Train Epoch #1: 1%|â | 589/41598 [02:12<2:31:50, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=589, grad_norm=0.145, lr=0.0002, loss=0.247]
Train Epoch #1: 2%|â | 635/41598 [02:22<2:31:35, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=635, grad_norm=0.175, lr=0.0002, loss=0.245]
Train Epoch #1: 2%|â | 681/41598 [02:32<2:31:18, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=681, grad_norm=0.166, lr=0.0002, loss=0.243]
Train Epoch #1: 2%|â | 727/41598 [02:42<2:31:02, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=727, grad_norm=0.159, lr=0.0002, loss=0.242]
Train Epoch #1: 2%|â | 773/41598 [02:53<2:30:48, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=773, grad_norm=0.132, lr=0.0002, loss=0.241]
Train Epoch #1: 2%|â | 819/41598 [03:03<2:31:17, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=819, grad_norm=0.169, lr=0.0002, loss=0.239]
Train Epoch #1: 2%|â | 865/41598 [03:13<2:30:53, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=865, grad_norm=0.132, lr=0.0002, loss=0.238]
Train Epoch #1: 2%|â | 911/41598 [03:23<2:30:33, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=911, grad_norm=0.145, lr=0.0002, loss=0.237]
Train Epoch #1: 2%|â | 957/41598 [03:34<2:30:13, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=957, grad_norm=0.144, lr=0.0002, loss=0.236]
Train Epoch #1: 2%|â | 1000/41598 [03:50<2:30:04, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1000, grad_norm=0.13, lr=0.0002, loss=0.235]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 1000 |
|
Train Epoch #1: 2%|â | 1001/41598 [03:56<3:24:48, 3.30it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1001, grad_norm=0.113, lr=0.0002, loss=0.235]
Train Epoch #1: 3%|â | 1047/41598 [04:06<3:07:59, 3.60it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1047, grad_norm=0.138, lr=0.0002, loss=0.234]
Train Epoch #1: 3%|â | 1093/41598 [04:16<2:56:10, 3.83it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1093, grad_norm=0.147, lr=0.0002, loss=0.233]
Train Epoch #1: 3%|â | 1139/41598 [04:26<2:47:55, 4.02it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1139, grad_norm=0.151, lr=0.0002, loss=0.233]
Train Epoch #1: 3%|â | 1185/41598 [04:36<2:42:11, 4.15it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1185, grad_norm=0.131, lr=0.0002, loss=0.231]
Train Epoch #1: 3%|â | 1231/41598 [04:47<2:38:42, 4.24it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1231, grad_norm=0.15, lr=0.0002, loss=0.231]
Train Epoch #1: 3%|â | 1277/41598 [04:57<2:35:43, 4.32it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1277, grad_norm=0.117, lr=0.0002, loss=0.23]
Train Epoch #1: 3%|â | 1323/41598 [05:07<2:33:30, 4.37it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1323, grad_norm=0.112, lr=0.0002, loss=0.229]
Train Epoch #1: 3%|â | 1369/41598 [05:17<2:31:50, 4.42it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1369, grad_norm=0.122, lr=0.0002, loss=0.229]
Train Epoch #1: 3%|â | 1415/41598 [05:27<2:30:41, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1415, grad_norm=0.128, lr=0.0002, loss=0.228]
Train Epoch #1: 4%|â | 1461/41598 [05:38<2:29:49, 4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1461, grad_norm=0.154, lr=0.0002, loss=0.229]
Train Epoch #1: 4%|â | 1507/41598 [05:48<2:29:10, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1507, grad_norm=0.14, lr=0.0002, loss=0.228]
Train Epoch #1: 4%|â | 1553/41598 [05:58<2:28:42, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1553, grad_norm=0.107, lr=0.0002, loss=0.227]
Train Epoch #1: 4%|â | 1599/41598 [06:08<2:28:47, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1599, grad_norm=0.126, lr=0.0002, loss=0.227]
Train Epoch #1: 4%|â | 1645/41598 [06:19<2:28:23, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1645, grad_norm=0.126, lr=0.0002, loss=0.226]
Train Epoch #1: 4%|â | 1691/41598 [06:29<2:28:00, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1691, grad_norm=0.108, lr=0.0002, loss=0.226]
Train Epoch #1: 4%|â | 1737/41598 [06:39<2:27:39, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1737, grad_norm=0.128, lr=0.0002, loss=0.226]
Train Epoch #1: 4%|â | 1783/41598 [06:49<2:27:21, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1783, grad_norm=0.118, lr=0.0002, loss=0.225]
Train Epoch #1: 4%|â | 1829/41598 [06:59<2:27:06, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1829, grad_norm=0.115, lr=0.0002, loss=0.225]
Train Epoch #1: 5%|â | 1875/41598 [07:10<2:26:54, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1875, grad_norm=0.126, lr=0.0002, loss=0.224]
Train Epoch #1: 5%|â | 1920/41598 [07:20<2:26:44, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1920, grad_norm=0.112, lr=0.0002, loss=0.224]
Train Epoch #1: 5%|â | 1921/41598 [07:20<2:26:39, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1921, grad_norm=0.113, lr=0.0002, loss=0.224]
Train Epoch #1: 5%|â | 1967/41598 [07:30<2:26:25, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=1967, grad_norm=0.153, lr=0.0002, loss=0.223]
Train Epoch #1: 5%|â | 2000/41598 [07:50<2:26:18, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2000, grad_norm=0.128, lr=0.0002, loss=0.223]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 2000 |
|
Train Epoch #1: 5%|â | 2001/41598 [07:51<3:29:18, 3.15it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2001, grad_norm=0.114, lr=0.0002, loss=0.223]
Train Epoch #1: 5%|â | 2047/41598 [08:01<3:09:05, 3.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2047, grad_norm=0.106, lr=0.0002, loss=0.223]
Train Epoch #1: 5%|â | 2093/41598 [08:11<2:55:28, 3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2093, grad_norm=0.114, lr=0.0002, loss=0.223]
Train Epoch #1: 5%|â | 2139/41598 [08:22<2:46:20, 3.95it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2139, grad_norm=0.14, lr=0.0002, loss=0.223]
Train Epoch #1: 5%|â | 2185/41598 [08:32<2:39:53, 4.11it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2185, grad_norm=0.119, lr=0.0002, loss=0.222]
Train Epoch #1: 5%|â | 2231/41598 [08:42<2:35:22, 4.22it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2231, grad_norm=0.123, lr=0.0002, loss=0.222]
Train Epoch #1: 5%|â | 2277/41598 [08:52<2:32:11, 4.31it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2277, grad_norm=0.137, lr=0.0002, loss=0.221]
Train Epoch #1: 6%|â | 2323/41598 [09:02<2:29:52, 4.37it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2323, grad_norm=0.157, lr=0.0002, loss=0.221]
Train Epoch #1: 6%|â | 2369/41598 [09:13<2:28:14, 4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2369, grad_norm=0.107, lr=0.0002, loss=0.22]
Train Epoch #1: 6%|â | 2415/41598 [09:23<2:27:02, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2415, grad_norm=0.113, lr=0.0002, loss=0.22]
Train Epoch #1: 6%|â | 2461/41598 [09:33<2:26:45, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2461, grad_norm=0.113, lr=0.0002, loss=0.22]
Train Epoch #1: 6%|â | 2507/41598 [09:43<2:26:00, 4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2507, grad_norm=0.133, lr=0.0002, loss=0.219]
Train Epoch #1: 6%|â | 2553/41598 [09:53<2:25:19, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2553, grad_norm=0.102, lr=0.0002, loss=0.219]
Train Epoch #1: 6%|â | 2599/41598 [10:04<2:24:53, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2599, grad_norm=0.107, lr=0.0002, loss=0.219]
Train Epoch #1: 6%|â | 2645/41598 [10:14<2:24:30, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2645, grad_norm=0.113, lr=0.0002, loss=0.219]
Train Epoch #1: 6%|â | 2691/41598 [10:24<2:24:09, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2691, grad_norm=0.109, lr=0.0002, loss=0.218]
Train Epoch #1: 7%|â | 2737/41598 [10:34<2:23:50, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2737, grad_norm=0.106, lr=0.0002, loss=0.218]
Train Epoch #1: 7%|â | 2783/41598 [10:44<2:23:34, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2783, grad_norm=0.116, lr=0.0002, loss=0.218]
Train Epoch #1: 7%|â | 2829/41598 [10:55<2:23:20, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2829, grad_norm=0.0959, lr=0.0002, loss=0.218]
Train Epoch #1: 7%|â | 2875/41598 [11:05<2:23:40, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2875, grad_norm=0.0972, lr=0.0002, loss=0.218]
Train Epoch #1: 7%|â | 2921/41598 [11:15<2:23:16, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2921, grad_norm=0.115, lr=0.0002, loss=0.217]
Train Epoch #1: 7%|â | 2967/41598 [11:25<2:22:56, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=2967, grad_norm=0.119, lr=0.0002, loss=0.217]
Train Epoch #1: 7%|â | 3000/41598 [11:40<2:22:49, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3000, grad_norm=0.105, lr=0.0002, loss=0.217]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 3000 |
|
Train Epoch #1: 7%|â | 3001/41598 [11:46<3:23:24, 3.16it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3001, grad_norm=0.105, lr=0.0002, loss=0.217]
Train Epoch #1: 7%|â | 3047/41598 [11:56<3:03:55, 3.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3047, grad_norm=0.104, lr=0.0002, loss=0.217]
Train Epoch #1: 7%|â | 3093/41598 [12:07<2:50:46, 3.76it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3093, grad_norm=0.105, lr=0.0002, loss=0.217]
Train Epoch #1: 8%|â | 3139/41598 [12:17<2:41:46, 3.96it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3139, grad_norm=0.0963, lr=0.0002, loss=0.217]
Train Epoch #1: 8%|â | 3185/41598 [12:27<2:35:32, 4.12it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3185, grad_norm=0.125, lr=0.0002, loss=0.217]
Train Epoch #1: 8%|â | 3231/41598 [12:37<2:31:43, 4.21it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3231, grad_norm=0.088, lr=0.0002, loss=0.216]
Train Epoch #1: 8%|â | 3277/41598 [12:47<2:28:28, 4.30it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3277, grad_norm=0.0906, lr=0.0002, loss=0.216]
Train Epoch #1: 8%|â | 3323/41598 [12:58<2:26:15, 4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3323, grad_norm=0.101, lr=0.0002, loss=0.216]
Train Epoch #1: 8%|â | 3369/41598 [13:08<2:24:36, 4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3369, grad_norm=0.116, lr=0.0002, loss=0.216]
Train Epoch #1: 8%|â | 3415/41598 [13:18<2:23:26, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3415, grad_norm=0.106, lr=0.0002, loss=0.216]
Train Epoch #1: 8%|â | 3461/41598 [13:28<2:22:30, 4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3461, grad_norm=0.121, lr=0.0002, loss=0.216]
Train Epoch #1: 8%|â | 3507/41598 [13:38<2:21:52, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3507, grad_norm=0.109, lr=0.0002, loss=0.216]
Train Epoch #1: 9%|â | 3553/41598 [13:49<2:21:21, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3553, grad_norm=0.104, lr=0.0002, loss=0.216]
Train Epoch #1: 9%|â | 3599/41598 [13:59<2:21:19, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3599, grad_norm=0.129, lr=0.0002, loss=0.216]
Train Epoch #1: 9%|â | 3645/41598 [14:09<2:20:51, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3645, grad_norm=0.0977, lr=0.0002, loss=0.216]
Train Epoch #1: 9%|â | 3691/41598 [14:19<2:20:26, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3691, grad_norm=0.0968, lr=0.0002, loss=0.216]
Train Epoch #1: 9%|â | 3737/41598 [14:30<2:20:06, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3737, grad_norm=0.103, lr=0.0002, loss=0.215]
Train Epoch #1: 9%|â | 3782/41598 [14:40<2:19:56, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3782, grad_norm=0.0863, lr=0.0002, loss=0.215]
Train Epoch #1: 9%|â | 3783/41598 [14:40<2:19:48, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3783, grad_norm=0.102, lr=0.0002, loss=0.215]
Train Epoch #1: 9%|â | 3829/41598 [14:50<2:19:32, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3829, grad_norm=0.109, lr=0.0002, loss=0.215]
Train Epoch #1: 9%|â | 3875/41598 [15:00<2:19:17, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3875, grad_norm=0.112, lr=0.0002, loss=0.215]
Train Epoch #1: 9%|â | 3921/41598 [15:10<2:19:09, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3921, grad_norm=0.108, lr=0.0002, loss=0.215]
Train Epoch #1: 10%|â | 3967/41598 [15:20<2:18:57, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=3967, grad_norm=0.0919, lr=0.0002, loss=0.215]
Train Epoch #1: 10%|â | 4000/41598 [15:40<2:18:49, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4000, grad_norm=0.105, lr=0.0002, loss=0.215] save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 4000 |
|
Train Epoch #1: 10%|â | 4001/41598 [15:41<3:18:46, 3.15it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4001, grad_norm=0.0955, lr=0.0002, loss=0.215]
Train Epoch #1: 10%|â | 4047/41598 [15:52<2:59:30, 3.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4047, grad_norm=0.0868, lr=0.0002, loss=0.215]
Train Epoch #1: 10%|â | 4093/41598 [16:02<2:46:33, 3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4093, grad_norm=0.098, lr=0.0002, loss=0.214]
Train Epoch #1: 10%|â | 4139/41598 [16:12<2:37:44, 3.96it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4139, grad_norm=0.0889, lr=0.0002, loss=0.214]
Train Epoch #1: 10%|â | 4185/41598 [16:22<2:31:36, 4.11it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4185, grad_norm=0.0853, lr=0.0002, loss=0.214]
Train Epoch #1: 10%|â | 4231/41598 [16:32<2:27:21, 4.23it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4231, grad_norm=0.0948, lr=0.0002, loss=0.214]
Train Epoch #1: 10%|â | 4277/41598 [16:43<2:24:22, 4.31it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4277, grad_norm=0.0864, lr=0.0002, loss=0.214]
Train Epoch #1: 10%|â | 4323/41598 [16:53<2:22:12, 4.37it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4323, grad_norm=0.0959, lr=0.0002, loss=0.214]
Train Epoch #1: 11%|â | 4369/41598 [17:03<2:20:37, 4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4369, grad_norm=0.0887, lr=0.0002, loss=0.214]
Train Epoch #1: 11%|â | 4415/41598 [17:13<2:19:31, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4415, grad_norm=0.0881, lr=0.0002, loss=0.214]
Train Epoch #1: 11%|â | 4461/41598 [17:24<2:19:15, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4461, grad_norm=0.111, lr=0.0002, loss=0.214]
Train Epoch #1: 11%|â | 4507/41598 [17:34<2:18:23, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4507, grad_norm=0.108, lr=0.0002, loss=0.214]
Train Epoch #1: 11%|â | 4553/41598 [17:44<2:17:46, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4553, grad_norm=0.111, lr=0.0002, loss=0.213]
Train Epoch #1: 11%|â | 4599/41598 [17:54<2:17:17, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4599, grad_norm=0.102, lr=0.0002, loss=0.213]
Train Epoch #1: 11%|â | 4645/41598 [18:04<2:16:52, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4645, grad_norm=0.0863, lr=0.0002, loss=0.213]
Train Epoch #1: 11%|ââ | 4691/41598 [18:14<2:16:35, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4691, grad_norm=0.0907, lr=0.0002, loss=0.213]
Train Epoch #1: 11%|ââ | 4737/41598 [18:25<2:16:18, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4737, grad_norm=0.0886, lr=0.0002, loss=0.213]
Train Epoch #1: 11%|ââ | 4783/41598 [18:35<2:16:03, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4783, grad_norm=0.0805, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 4829/41598 [18:45<2:16:21, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4829, grad_norm=0.0818, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 4875/41598 [18:55<2:15:59, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4875, grad_norm=0.0849, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 4921/41598 [19:06<2:15:43, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4921, grad_norm=0.0725, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 4967/41598 [19:16<2:15:25, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=4967, grad_norm=0.0815, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 5000/41598 [19:30<2:15:18, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5000, grad_norm=0.0895, lr=0.0002, loss=0.213]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 5000 |
|
Train Epoch #1: 12%|ââ | 5001/41598 [19:38<3:18:32, 3.07it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5001, grad_norm=0.11, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 5047/41598 [19:48<2:58:14, 3.42it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5047, grad_norm=0.1, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 5093/41598 [19:58<2:44:30, 3.70it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5093, grad_norm=0.0813, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 5139/41598 [20:09<2:35:19, 3.91it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5139, grad_norm=0.099, lr=0.0002, loss=0.213]
Train Epoch #1: 12%|ââ | 5184/41598 [20:19<2:29:30, 4.06it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5184, grad_norm=0.104, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5230/41598 [20:29<2:24:41, 4.19it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5230, grad_norm=0.0829, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5276/41598 [20:39<2:21:26, 4.28it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5276, grad_norm=0.0901, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5322/41598 [20:49<2:19:05, 4.35it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5322, grad_norm=0.0725, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5368/41598 [20:59<2:17:21, 4.40it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5368, grad_norm=0.0764, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5413/41598 [21:10<2:17:11, 4.40it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5413, grad_norm=0.0762, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5414/41598 [21:10<2:16:08, 4.43it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5414, grad_norm=0.0881, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5460/41598 [21:20<2:15:14, 4.45it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5460, grad_norm=0.0783, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5506/41598 [21:30<2:14:31, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5506, grad_norm=0.0985, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5552/41598 [21:40<2:14:29, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5552, grad_norm=0.0746, lr=0.0002, loss=0.212]
Train Epoch #1: 13%|ââ | 5598/41598 [21:51<2:13:53, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5598, grad_norm=0.089, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5644/41598 [22:01<2:13:31, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5644, grad_norm=0.0981, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5690/41598 [22:11<2:13:07, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5690, grad_norm=0.083, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5736/41598 [22:21<2:12:47, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5736, grad_norm=0.0793, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5782/41598 [22:31<2:12:28, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5782, grad_norm=0.0676, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5828/41598 [22:42<2:12:17, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5828, grad_norm=0.073, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5874/41598 [22:52<2:12:06, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5874, grad_norm=0.0691, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5920/41598 [23:02<2:11:51, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5920, grad_norm=0.0707, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 5966/41598 [23:12<2:12:10, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=5966, grad_norm=0.0715, lr=0.0002, loss=0.211]
Train Epoch #1: 14%|ââ | 6000/41598 [23:30<2:12:02, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6000, grad_norm=0.0748, lr=0.0002, loss=0.211]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 6000 |
|
Train Epoch #1: 14%|ââ | 6001/41598 [23:35<3:13:47, 3.06it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6001, grad_norm=0.0764, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6047/41598 [23:45<2:53:53, 3.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6047, grad_norm=0.0847, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6093/41598 [23:55<2:40:26, 3.69it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6093, grad_norm=0.0682, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6139/41598 [24:05<2:31:18, 3.91it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6139, grad_norm=0.0916, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6185/41598 [24:16<2:24:54, 4.07it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6185, grad_norm=0.0722, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6231/41598 [24:26<2:20:27, 4.20it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6231, grad_norm=0.0691, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6277/41598 [24:36<2:17:19, 4.29it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6277, grad_norm=0.0788, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6323/41598 [24:46<2:15:05, 4.35it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6323, grad_norm=0.0717, lr=0.0002, loss=0.211]
Train Epoch #1: 15%|ââ | 6369/41598 [24:57<2:14:05, 4.38it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6369, grad_norm=0.0665, lr=0.0002, loss=0.21]
Train Epoch #1: 15%|ââ | 6415/41598 [25:07<2:12:49, 4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6415, grad_norm=0.0733, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6461/41598 [25:17<2:11:50, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6461, grad_norm=0.0812, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6507/41598 [25:27<2:11:01, 4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6507, grad_norm=0.0779, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6553/41598 [25:37<2:10:27, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6553, grad_norm=0.082, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6599/41598 [25:48<2:09:56, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6599, grad_norm=0.0662, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6645/41598 [25:58<2:09:36, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6645, grad_norm=0.0736, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6691/41598 [26:08<2:09:18, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6691, grad_norm=0.0649, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6737/41598 [26:18<2:09:00, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6737, grad_norm=0.0807, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6783/41598 [26:28<2:08:43, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6783, grad_norm=0.0769, lr=0.0002, loss=0.21]
Train Epoch #1: 16%|ââ | 6829/41598 [26:39<2:09:04, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6829, grad_norm=0.0634, lr=0.0002, loss=0.21]
Train Epoch #1: 17%|ââ | 6875/41598 [26:49<2:08:45, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6875, grad_norm=0.0655, lr=0.0002, loss=0.21]
Train Epoch #1: 17%|ââ | 6921/41598 [26:59<2:08:24, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6921, grad_norm=0.0564, lr=0.0002, loss=0.21]
Train Epoch #1: 17%|ââ | 6967/41598 [27:09<2:08:08, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=6967, grad_norm=0.0604, lr=0.0002, loss=0.21]
Train Epoch #1: 17%|ââ | 7000/41598 [27:20<2:08:01, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7000, grad_norm=0.0555, lr=0.0002, loss=0.21]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 7000 |
|
Train Epoch #1: 17%|ââ | 7001/41598 [27:30<3:02:08, 3.17it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7001, grad_norm=0.0581, lr=0.0002, loss=0.21]
Train Epoch #1: 17%|ââ | 7047/41598 [27:40<2:44:40, 3.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7047, grad_norm=0.0551, lr=0.0002, loss=0.21]
Train Epoch #1: 17%|ââ | 7093/41598 [27:51<2:32:56, 3.76it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7093, grad_norm=0.0599, lr=0.0002, loss=0.209]
Train Epoch #1: 17%|ââ | 7139/41598 [28:01<2:24:54, 3.96it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7139, grad_norm=0.0709, lr=0.0002, loss=0.209]
Train Epoch #1: 17%|ââ | 7185/41598 [28:11<2:19:56, 4.10it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7185, grad_norm=0.0665, lr=0.0002, loss=0.209]
Train Epoch #1: 17%|ââ | 7231/41598 [28:21<2:15:53, 4.21it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7231, grad_norm=0.0825, lr=0.0002, loss=0.209]
Train Epoch #1: 17%|ââ | 7277/41598 [28:31<2:12:58, 4.30it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7277, grad_norm=0.056, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7323/41598 [28:42<2:10:56, 4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7323, grad_norm=0.0661, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7369/41598 [28:52<2:09:25, 4.41it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7369, grad_norm=0.0523, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7415/41598 [29:02<2:08:21, 4.44it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7415, grad_norm=0.0703, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7461/41598 [29:12<2:07:35, 4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7461, grad_norm=0.0575, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7507/41598 [29:22<2:06:59, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7507, grad_norm=0.0621, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7553/41598 [29:33<2:07:00, 4.47it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7553, grad_norm=0.0572, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7599/41598 [29:43<2:06:28, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7599, grad_norm=0.0588, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7645/41598 [29:53<2:06:06, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7645, grad_norm=0.0514, lr=0.0002, loss=0.209]
Train Epoch #1: 18%|ââ | 7691/41598 [30:03<2:05:45, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7691, grad_norm=0.0525, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 7737/41598 [30:14<2:05:26, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7737, grad_norm=0.058, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 7783/41598 [30:24<2:05:12, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7783, grad_norm=0.0552, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 7829/41598 [30:34<2:04:57, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7829, grad_norm=0.053, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 7875/41598 [30:44<2:04:41, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7875, grad_norm=0.0671, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 7921/41598 [30:55<2:04:57, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7921, grad_norm=0.0586, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 7967/41598 [31:05<2:04:36, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=7967, grad_norm=0.0567, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 8000/41598 [31:20<2:04:29, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8000, grad_norm=0.0577, lr=0.0002, loss=0.209]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 8000 |
|
Train Epoch #1: 19%|ââ | 8001/41598 [31:26<2:58:02, 3.14it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8001, grad_norm=0.0592, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 8047/41598 [31:36<2:40:41, 3.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8047, grad_norm=0.0468, lr=0.0002, loss=0.209]
Train Epoch #1: 19%|ââ | 8093/41598 [31:46<2:28:57, 3.75it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8093, grad_norm=0.058, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8139/41598 [31:56<2:21:02, 3.95it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8139, grad_norm=0.0472, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8185/41598 [32:07<2:15:30, 4.11it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8185, grad_norm=0.0571, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8231/41598 [32:17<2:11:39, 4.22it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8231, grad_norm=0.046, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8277/41598 [32:27<2:09:24, 4.29it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8277, grad_norm=0.0581, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8323/41598 [32:37<2:07:17, 4.36it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8323, grad_norm=0.0559, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8369/41598 [32:47<2:05:48, 4.40it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8369, grad_norm=0.0488, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8415/41598 [32:58<2:04:42, 4.43it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8415, grad_norm=0.0548, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8461/41598 [33:08<2:03:51, 4.46it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8461, grad_norm=0.0602, lr=0.0002, loss=0.208]
Train Epoch #1: 20%|ââ | 8507/41598 [33:18<2:03:12, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8507, grad_norm=0.0558, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8553/41598 [33:28<2:02:49, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8553, grad_norm=0.0524, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8599/41598 [33:39<2:02:45, 4.48it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8599, grad_norm=0.0641, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8645/41598 [33:49<2:02:21, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8645, grad_norm=0.055, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8691/41598 [33:59<2:02:02, 4.49it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8691, grad_norm=0.0491, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8737/41598 [34:09<2:01:45, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8737, grad_norm=0.0527, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8783/41598 [34:19<2:01:28, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8783, grad_norm=0.0427, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|ââ | 8829/41598 [34:30<2:01:14, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8829, grad_norm=0.0502, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|âââ | 8874/41598 [34:40<2:01:04, 4.50it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8874, grad_norm=0.0478, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|âââ | 8875/41598 [34:40<2:00:59, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8875, grad_norm=0.0445, lr=0.0002, loss=0.208]
Train Epoch #1: 21%|âââ | 8921/41598 [34:50<2:00:46, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8921, grad_norm=0.0485, lr=0.0002, loss=0.208]
Train Epoch #1: 22%|âââ | 8967/41598 [35:00<2:00:34, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=8967, grad_norm=0.0527, lr=0.0002, loss=0.208]
Train Epoch #1: 22%|âââ | 9000/41598 [35:20<2:00:26, 4.51it/s, shape=torch.Size([32, 32, 16, 16]), global_step=9000, grad_norm=0.0529, lr=0.0002, loss=0.208]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 9000 |
|
Train Epoch #1: 22%|âââ | 9001/41598 [35:22<2:53:55, 3.12it/s, shape=torch.Size([32, 32, 16, 16]), global_step=9001, grad_norm=0.0566, lr=0.0002, loss=0.208]run_dir: .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16loading checkpoint .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt |
|
optimizer loaded |
|
scaler loaded |
|
epoch: 0 |
|
global_step=9000 |
|
lr scheduler loaded |
|
train generator state loaded |
|
torch rng state loaded |
|
torch cuda rng state loaded |
|
best_fid=inf |
|
checkpoint .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt loaded |
|
Train Epoch #1: 0%| | 0/10399 [00:00<?, ?it/s]skipping first 9000 steps
Train Epoch #1: 87%|âââââââââ | 9013/10399 [00:10<00:01, 871.89it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9013, grad_norm=0.025, lr=0.0002, loss=0.205]
Train Epoch #1: 87%|âââââââââ | 9036/10399 [00:25<00:01, 871.89it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9036, grad_norm=0.0208, lr=0.0002, loss=0.205]
Train Epoch #1: 87%|âââââââââ | 9037/10399 [00:25<00:04, 277.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9037, grad_norm=0.0244, lr=0.0002, loss=0.204]
Train Epoch #1: 87%|âââââââââ | 9053/10399 [00:36<00:07, 168.68it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9053, grad_norm=0.0218, lr=0.0002, loss=0.205]
Train Epoch #1: 87%|âââââââââ | 9069/10399 [00:46<00:12, 108.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9069, grad_norm=0.0193, lr=0.0002, loss=0.202]
Train Epoch #1: 87%|âââââââââ | 9085/10399 [00:57<00:18, 72.10it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9085, grad_norm=0.0218, lr=0.0002, loss=0.201]
Train Epoch #1: 88%|âââââââââ | 9101/10399 [01:07<00:26, 49.03it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9101, grad_norm=0.0216, lr=0.0002, loss=0.197]
Train Epoch #1: 88%|âââââââââ | 9117/10399 [01:17<00:37, 33.89it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9117, grad_norm=0.0203, lr=0.0002, loss=0.196]
Train Epoch #1: 88%|âââââââââ | 9133/10399 [01:28<00:53, 23.77it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9133, grad_norm=0.0196, lr=0.0002, loss=0.196]
Train Epoch #1: 88%|âââââââââ | 9149/10399 [01:38<01:13, 16.90it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9149, grad_norm=0.023, lr=0.0002, loss=0.197]
Train Epoch #1: 88%|âââââââââ | 9165/10399 [01:49<01:41, 12.19it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9165, grad_norm=0.0212, lr=0.0002, loss=0.198]
Train Epoch #1: 88%|âââââââââ | 9181/10399 [01:59<02:16, 8.95it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9181, grad_norm=0.0199, lr=0.0002, loss=0.198]
Train Epoch #1: 88%|âââââââââ | 9196/10399 [02:10<02:14, 8.95it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9196, grad_norm=0.0225, lr=0.0002, loss=0.198]
Train Epoch #1: 88%|âââââââââ | 9197/10399 [02:10<02:59, 6.70it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9197, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1: 89%|âââââââââ | 9213/10399 [02:20<03:50, 5.14it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9213, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1: 89%|âââââââââ | 9229/10399 [02:30<04:48, 4.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9229, grad_norm=0.0215, lr=0.0002, loss=0.199]
Train Epoch #1: 89%|âââââââââ | 9245/10399 [02:41<05:50, 3.30it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9245, grad_norm=0.0208, lr=0.0002, loss=0.198]
Train Epoch #1: 89%|âââââââââ | 9261/10399 [02:51<06:51, 2.77it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9261, grad_norm=0.0211, lr=0.0002, loss=0.197]
Train Epoch #1: 89%|âââââââââ | 9277/10399 [03:02<07:48, 2.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9277, grad_norm=0.0231, lr=0.0002, loss=0.197]
Train Epoch #1: 89%|âââââââââ | 9293/10399 [03:12<08:37, 2.14it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9293, grad_norm=0.0251, lr=0.0002, loss=0.197]
Train Epoch #1: 90%|âââââââââ | 9309/10399 [03:23<09:17, 1.95it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9309, grad_norm=0.0227, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9325/10399 [03:33<09:47, 1.83it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9325, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9341/10399 [03:44<10:08, 1.74it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9341, grad_norm=0.0221, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9357/10399 [03:54<10:21, 1.68it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9357, grad_norm=0.0223, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9373/10399 [04:05<10:30, 1.63it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9373, grad_norm=0.0207, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9389/10399 [04:15<10:31, 1.60it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9389, grad_norm=0.0208, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9404/10399 [04:25<10:21, 1.60it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9404, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #1: 90%|âââââââââ | 9405/10399 [04:25<10:29, 1.58it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9405, grad_norm=0.0263, lr=0.0002, loss=0.198]
Train Epoch #1: 91%|âââââââââ | 9421/10399 [04:36<10:24, 1.57it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9421, grad_norm=0.0238, lr=0.0002, loss=0.198]
Train Epoch #1: 91%|âââââââââ | 9437/10399 [04:46<10:18, 1.56it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9437, grad_norm=0.0239, lr=0.0002, loss=0.198]
Train Epoch #1: 91%|âââââââââ | 9453/10399 [04:57<10:10, 1.55it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9453, grad_norm=0.0219, lr=0.0002, loss=0.198]
Train Epoch #1: 91%|âââââââââ | 9469/10399 [05:07<10:02, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9469, grad_norm=0.0208, lr=0.0002, loss=0.198]
Train Epoch #1: 91%|âââââââââ | 9485/10399 [05:18<09:53, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9485, grad_norm=0.0222, lr=0.0002, loss=0.197]
Train Epoch #1: 91%|ââââââââââ| 9501/10399 [05:28<09:43, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9501, grad_norm=0.0239, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9517/10399 [05:38<09:33, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9517, grad_norm=0.0226, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9533/10399 [05:49<09:23, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9533, grad_norm=0.0206, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9549/10399 [05:59<09:13, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9549, grad_norm=0.0228, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9564/10399 [06:10<09:03, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9564, grad_norm=0.0222, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9565/10399 [06:10<09:03, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9565, grad_norm=0.0205, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9581/10399 [06:20<08:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9581, grad_norm=0.02, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9597/10399 [06:31<08:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9597, grad_norm=0.0215, lr=0.0002, loss=0.198]
Train Epoch #1: 92%|ââââââââââ| 9613/10399 [06:41<08:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9613, grad_norm=0.0196, lr=0.0002, loss=0.198]
Train Epoch #1: 93%|ââââââââââ| 9629/10399 [06:51<08:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9629, grad_norm=0.0196, lr=0.0002, loss=0.198]
Train Epoch #1: 93%|ââââââââââ| 9645/10399 [07:02<08:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9645, grad_norm=0.0267, lr=0.0002, loss=0.198]
Train Epoch #1: 93%|ââââââââââ| 9661/10399 [07:12<08:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9661, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1: 93%|ââââââââââ| 9677/10399 [07:23<07:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9677, grad_norm=0.0216, lr=0.0002, loss=0.198]
Train Epoch #1: 93%|ââââââââââ| 9693/10399 [07:33<07:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9693, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #1: 93%|ââââââââââ| 9709/10399 [07:44<07:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9709, grad_norm=0.0246, lr=0.0002, loss=0.198]
Train Epoch #1: 94%|ââââââââââ| 9725/10399 [07:54<07:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9725, grad_norm=0.0205, lr=0.0002, loss=0.198]
Train Epoch #1: 94%|ââââââââââ| 9741/10399 [08:05<07:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9741, grad_norm=0.0241, lr=0.0002, loss=0.198]
Train Epoch #1: 94%|ââââââââââ| 9757/10399 [08:15<06:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9757, grad_norm=0.0201, lr=0.0002, loss=0.198]
Train Epoch #1: 94%|ââââââââââ| 9772/10399 [08:25<06:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9772, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #1: 94%|ââââââââââ| 9773/10399 [08:25<06:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9773, grad_norm=0.0241, lr=0.0002, loss=0.197]
Train Epoch #1: 94%|ââââââââââ| 9789/10399 [08:36<06:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9789, grad_norm=0.0222, lr=0.0002, loss=0.197]
Train Epoch #1: 94%|ââââââââââ| 9805/10399 [08:46<06:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9805, grad_norm=0.025, lr=0.0002, loss=0.197]
Train Epoch #1: 94%|ââââââââââ| 9821/10399 [08:57<06:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9821, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #1: 95%|ââââââââââ| 9837/10399 [09:07<06:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9837, grad_norm=0.0255, lr=0.0002, loss=0.197]
Train Epoch #1: 95%|ââââââââââ| 9853/10399 [09:18<05:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9853, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #1: 95%|ââââââââââ| 9869/10399 [09:28<05:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9869, grad_norm=0.02, lr=0.0002, loss=0.198]
Train Epoch #1: 95%|ââââââââââ| 9885/10399 [09:39<05:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9885, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1: 95%|ââââââââââ| 9901/10399 [09:49<05:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9901, grad_norm=0.0239, lr=0.0002, loss=0.198]
Train Epoch #1: 95%|ââââââââââ| 9917/10399 [09:59<05:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9917, grad_norm=0.0241, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 9932/10399 [10:10<05:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9932, grad_norm=0.0203, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 9933/10399 [10:10<05:03, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9933, grad_norm=0.0207, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 9949/10399 [10:20<04:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9949, grad_norm=0.0219, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 9965/10399 [10:31<04:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9965, grad_norm=0.0222, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 9981/10399 [10:41<04:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9981, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 9997/10399 [10:52<04:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=9997, grad_norm=0.0215, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 10000/10399 [11:05<04:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0221, lr=0.0002, loss=0.198]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 10000 |
|
Train Epoch #1: 96%|ââââââââââ| 10001/10399 [11:08<06:30, 1.02it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0236, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 10017/10399 [11:18<05:30, 1.16it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0199, lr=0.0002, loss=0.198]
Train Epoch #1: 96%|ââââââââââ| 10033/10399 [11:29<04:49, 1.26it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10049/10399 [11:39<04:21, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=1e+4, grad_norm=0.0242, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10064/10399 [11:50<04:10, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10064, grad_norm=0.0209, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10065/10399 [11:50<03:59, 1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10065, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10081/10399 [12:00<03:41, 1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10081, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10097/10399 [12:10<03:26, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10097, grad_norm=0.0224, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10113/10399 [12:21<03:12, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10113, grad_norm=0.0186, lr=0.0002, loss=0.198]
Train Epoch #1: 97%|ââââââââââ| 10129/10399 [12:31<03:00, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10129, grad_norm=0.0244, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10145/10399 [12:42<02:48, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10145, grad_norm=0.0211, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10161/10399 [12:52<02:37, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10161, grad_norm=0.0245, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10177/10399 [13:03<02:26, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10177, grad_norm=0.0256, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10193/10399 [13:13<02:15, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10193, grad_norm=0.0213, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10209/10399 [13:24<02:04, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10209, grad_norm=0.0218, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10225/10399 [13:34<01:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10225, grad_norm=0.0242, lr=0.0002, loss=0.198]
Train Epoch #1: 98%|ââââââââââ| 10241/10399 [13:45<01:43, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10241, grad_norm=0.0226, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10257/10399 [13:55<01:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10257, grad_norm=0.0196, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10272/10399 [14:05<01:23, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10272, grad_norm=0.0201, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10273/10399 [14:05<01:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10273, grad_norm=0.0217, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10289/10399 [14:16<01:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10289, grad_norm=0.0251, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10305/10399 [14:26<01:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10305, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10321/10399 [14:37<00:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10321, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 99%|ââââââââââ| 10337/10399 [14:47<00:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10337, grad_norm=0.022, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|ââââââââââ| 10353/10399 [14:58<00:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10353, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|ââââââââââ| 10369/10399 [15:08<00:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10369, grad_norm=0.021, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|ââââââââââ| 10385/10399 [15:19<00:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10385, grad_norm=0.0207, lr=0.0002, loss=0.198]
Train Epoch #1: 100%|ââââââââââ| 10399/10399 [15:28<00:00, 11.20it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10399, grad_norm=0.0234, lr=0.0002, loss=0.198] |
|
train info dict: {'loss': 0.19794975221157074} |
|
Train Epoch #2: 0%| | 0/10399 [00:00<?, ?it/s]
Train Epoch #2: 0%| | 14/10399 [00:10<2:04:09, 1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10413, grad_norm=0.0194, lr=0.0002, loss=0.198]
Train Epoch #2: 0%| | 30/10399 [00:20<1:56:59, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10429, grad_norm=0.0242, lr=0.0002, loss=0.194]
Train Epoch #2: 0%| | 46/10399 [00:30<1:54:50, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10445, grad_norm=0.0251, lr=0.0002, loss=0.195]
Train Epoch #2: 1%| | 62/10399 [00:41<1:53:46, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10461, grad_norm=0.0221, lr=0.0002, loss=0.197]
Train Epoch #2: 1%| | 77/10399 [00:51<1:53:36, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10476, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #2: 1%| | 78/10399 [00:51<1:53:05, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10477, grad_norm=0.0234, lr=0.0002, loss=0.198]
Train Epoch #2: 1%| | 94/10399 [01:02<1:52:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10493, grad_norm=0.0214, lr=0.0002, loss=0.198]
Train Epoch #2: 1%| | 110/10399 [01:12<1:52:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10509, grad_norm=0.0231, lr=0.0002, loss=0.199]
Train Epoch #2: 1%| | 126/10399 [01:23<1:51:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10525, grad_norm=0.0227, lr=0.0002, loss=0.197]
Train Epoch #2: 1%|â | 142/10399 [01:33<1:51:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10541, grad_norm=0.0235, lr=0.0002, loss=0.198]
Train Epoch #2: 2%|â | 158/10399 [01:43<1:51:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10557, grad_norm=0.0214, lr=0.0002, loss=0.199]
Train Epoch #2: 2%|â | 174/10399 [01:54<1:51:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10573, grad_norm=0.0227, lr=0.0002, loss=0.199]
Train Epoch #2: 2%|â | 190/10399 [02:05<1:51:40, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10589, grad_norm=0.0255, lr=0.0002, loss=0.199]
Train Epoch #2: 2%|â | 206/10399 [02:15<1:51:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10605, grad_norm=0.0216, lr=0.0002, loss=0.198]
Train Epoch #2: 2%|â | 222/10399 [02:25<1:50:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10621, grad_norm=0.0203, lr=0.0002, loss=0.197]
Train Epoch #2: 2%|â | 238/10399 [02:36<1:50:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10637, grad_norm=0.024, lr=0.0002, loss=0.197]
Train Epoch #2: 2%|â | 254/10399 [02:46<1:50:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10653, grad_norm=0.0212, lr=0.0002, loss=0.197]
Train Epoch #2: 3%|â | 269/10399 [02:57<1:50:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10668, grad_norm=0.0204, lr=0.0002, loss=0.197]
Train Epoch #2: 3%|â | 270/10399 [02:57<1:50:18, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10669, grad_norm=0.0216, lr=0.0002, loss=0.197]
Train Epoch #2: 3%|â | 286/10399 [03:07<1:50:05, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10685, grad_norm=0.0223, lr=0.0002, loss=0.197]
Train Epoch #2: 3%|â | 302/10399 [03:18<1:49:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10701, grad_norm=0.0207, lr=0.0002, loss=0.197]
Train Epoch #2: 3%|â | 318/10399 [03:28<1:49:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10717, grad_norm=0.0217, lr=0.0002, loss=0.197]
Train Epoch #2: 3%|â | 334/10399 [03:39<1:49:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10733, grad_norm=0.0198, lr=0.0002, loss=0.196]
Train Epoch #2: 3%|â | 350/10399 [03:49<1:49:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10749, grad_norm=0.0207, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 366/10399 [03:59<1:49:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10765, grad_norm=0.0221, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 382/10399 [04:10<1:49:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10781, grad_norm=0.02, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 398/10399 [04:20<1:48:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10797, grad_norm=0.023, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 414/10399 [04:31<1:48:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10813, grad_norm=0.0215, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 429/10399 [04:41<1:48:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10828, grad_norm=0.0212, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 430/10399 [04:41<1:48:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10829, grad_norm=0.0227, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 446/10399 [04:52<1:48:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10845, grad_norm=0.0214, lr=0.0002, loss=0.196]
Train Epoch #2: 4%|â | 462/10399 [05:02<1:48:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10861, grad_norm=0.0231, lr=0.0002, loss=0.196]
Train Epoch #2: 5%|â | 478/10399 [05:13<1:47:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10877, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2: 5%|â | 494/10399 [05:23<1:47:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10893, grad_norm=0.0192, lr=0.0002, loss=0.196]
Train Epoch #2: 5%|â | 510/10399 [05:33<1:47:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10909, grad_norm=0.0197, lr=0.0002, loss=0.196]
Train Epoch #2: 5%|â | 526/10399 [05:44<1:47:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10925, grad_norm=0.0211, lr=0.0002, loss=0.196]
Train Epoch #2: 5%|â | 542/10399 [05:54<1:47:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10941, grad_norm=0.02, lr=0.0002, loss=0.196]
Train Epoch #2: 5%|â | 558/10399 [06:05<1:47:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10957, grad_norm=0.0204, lr=0.0002, loss=0.196]
Train Epoch #2: 6%|â | 574/10399 [06:15<1:47:10, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10973, grad_norm=0.0207, lr=0.0002, loss=0.196]
Train Epoch #2: 6%|â | 590/10399 [06:26<1:46:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=10989, grad_norm=0.0198, lr=0.0002, loss=0.196]
Train Epoch #2: 6%|â | 601/10399 [06:37<1:46:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11000, grad_norm=0.0216, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 11000 |
|
Train Epoch #2: 6%|â | 602/10399 [06:47<2:32:25, 1.07it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11001, grad_norm=0.023, lr=0.0002, loss=0.196]
Train Epoch #2: 6%|â | 618/10399 [06:58<2:17:34, 1.18it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11017, grad_norm=0.0203, lr=0.0002, loss=0.197]
Train Epoch #2: 6%|â | 634/10399 [07:08<2:07:32, 1.28it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11033, grad_norm=0.0191, lr=0.0002, loss=0.197]
Train Epoch #2: 6%|â | 650/10399 [07:19<2:00:42, 1.35it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11049, grad_norm=0.0249, lr=0.0002, loss=0.196]
Train Epoch #2: 6%|â | 666/10399 [07:29<1:55:59, 1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11065, grad_norm=0.0193, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 682/10399 [07:40<1:52:42, 1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11081, grad_norm=0.0222, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 698/10399 [07:50<1:50:20, 1.47it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11097, grad_norm=0.0194, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 714/10399 [08:00<1:48:39, 1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11113, grad_norm=0.0243, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 730/10399 [08:11<1:47:26, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11129, grad_norm=0.0194, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 745/10399 [08:21<1:47:16, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11144, grad_norm=0.022, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 746/10399 [08:21<1:46:31, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11145, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 762/10399 [08:32<1:45:51, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11161, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #2: 7%|â | 778/10399 [08:42<1:45:21, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11177, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #2: 8%|â | 794/10399 [08:53<1:44:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11193, grad_norm=0.0219, lr=0.0002, loss=0.197]
Train Epoch #2: 8%|â | 810/10399 [09:03<1:44:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11209, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2: 8%|â | 826/10399 [09:13<1:44:23, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11225, grad_norm=0.023, lr=0.0002, loss=0.197]
Train Epoch #2: 8%|â | 842/10399 [09:24<1:44:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11241, grad_norm=0.0216, lr=0.0002, loss=0.197]
Train Epoch #2: 8%|â | 858/10399 [09:34<1:43:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11257, grad_norm=0.0213, lr=0.0002, loss=0.197]
Train Epoch #2: 8%|â | 874/10399 [09:45<1:43:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11273, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 890/10399 [09:55<1:43:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11289, grad_norm=0.0205, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 906/10399 [10:06<1:43:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11305, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 922/10399 [10:16<1:43:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11321, grad_norm=0.0204, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 938/10399 [10:27<1:43:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11337, grad_norm=0.0193, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 953/10399 [10:37<1:42:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11352, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 954/10399 [10:37<1:42:51, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11353, grad_norm=0.017, lr=0.0002, loss=0.197]
Train Epoch #2: 9%|â | 970/10399 [10:48<1:42:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11369, grad_norm=0.0204, lr=0.0002, loss=0.196]
Train Epoch #2: 9%|â | 986/10399 [10:58<1:42:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11385, grad_norm=0.02, lr=0.0002, loss=0.196]
Train Epoch #2: 10%|â | 1002/10399 [11:08<1:42:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11401, grad_norm=0.0227, lr=0.0002, loss=0.197]
Train Epoch #2: 10%|â | 1018/10399 [11:19<1:42:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11417, grad_norm=0.0205, lr=0.0002, loss=0.197]
Train Epoch #2: 10%|â | 1034/10399 [11:29<1:41:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11433, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #2: 10%|â | 1050/10399 [11:40<1:41:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11449, grad_norm=0.019, lr=0.0002, loss=0.197]
Train Epoch #2: 10%|â | 1066/10399 [11:50<1:41:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11465, grad_norm=0.0195, lr=0.0002, loss=0.197]
Train Epoch #2: 10%|â | 1082/10399 [12:01<1:41:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11481, grad_norm=0.0213, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|â | 1098/10399 [12:11<1:41:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11497, grad_norm=0.0185, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|â | 1113/10399 [12:21<1:40:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11512, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|â | 1114/10399 [12:21<1:40:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11513, grad_norm=0.0182, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|â | 1130/10399 [12:32<1:40:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11529, grad_norm=0.0201, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|â | 1146/10399 [12:42<1:40:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11545, grad_norm=0.019, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|â | 1162/10399 [12:53<1:40:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11561, grad_norm=0.0199, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|ââ | 1178/10399 [13:03<1:40:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11577, grad_norm=0.0186, lr=0.0002, loss=0.197]
Train Epoch #2: 11%|ââ | 1194/10399 [13:14<1:40:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11593, grad_norm=0.0201, lr=0.0002, loss=0.197]
Train Epoch #2: 12%|ââ | 1210/10399 [13:24<1:39:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11609, grad_norm=0.0235, lr=0.0002, loss=0.197]
Train Epoch #2: 12%|ââ | 1226/10399 [13:35<1:39:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11625, grad_norm=0.0183, lr=0.0002, loss=0.197]
Train Epoch #2: 12%|ââ | 1242/10399 [13:45<1:39:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11641, grad_norm=0.0214, lr=0.0002, loss=0.197]
Train Epoch #2: 12%|ââ | 1258/10399 [13:55<1:39:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11657, grad_norm=0.0215, lr=0.0002, loss=0.197]
Train Epoch #2: 12%|ââ | 1274/10399 [14:06<1:39:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11673, grad_norm=0.0191, lr=0.0002, loss=0.197]
Train Epoch #2: 12%|ââ | 1290/10399 [14:16<1:39:03, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11689, grad_norm=0.0193, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1305/10399 [14:27<1:38:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11704, grad_norm=0.0163, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1306/10399 [14:27<1:38:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11705, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1322/10399 [14:37<1:38:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11721, grad_norm=0.0194, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1338/10399 [14:48<1:38:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11737, grad_norm=0.0196, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1354/10399 [14:58<1:38:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11753, grad_norm=0.0199, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1370/10399 [15:09<1:38:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11769, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1386/10399 [15:19<1:38:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11785, grad_norm=0.0213, lr=0.0002, loss=0.197]
Train Epoch #2: 13%|ââ | 1402/10399 [15:30<1:37:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11801, grad_norm=0.0197, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1418/10399 [15:40<1:37:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11817, grad_norm=0.0171, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1434/10399 [15:50<1:37:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11833, grad_norm=0.0183, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1450/10399 [16:01<1:37:18, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11849, grad_norm=0.0186, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1465/10399 [16:11<1:37:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11864, grad_norm=0.0199, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1466/10399 [16:11<1:37:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11865, grad_norm=0.0206, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1482/10399 [16:22<1:36:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11881, grad_norm=0.0178, lr=0.0002, loss=0.197]
Train Epoch #2: 14%|ââ | 1498/10399 [16:32<1:36:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11897, grad_norm=0.0175, lr=0.0002, loss=0.197]
Train Epoch #2: 15%|ââ | 1514/10399 [16:43<1:36:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11913, grad_norm=0.0192, lr=0.0002, loss=0.196]
Train Epoch #2: 15%|ââ | 1530/10399 [16:53<1:36:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11929, grad_norm=0.018, lr=0.0002, loss=0.196]
Train Epoch #2: 15%|ââ | 1546/10399 [17:03<1:36:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11945, grad_norm=0.0184, lr=0.0002, loss=0.196]
Train Epoch #2: 15%|ââ | 1562/10399 [17:14<1:36:05, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11961, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2: 15%|ââ | 1578/10399 [17:24<1:35:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11977, grad_norm=0.0181, lr=0.0002, loss=0.196]
Train Epoch #2: 15%|ââ | 1594/10399 [17:35<1:35:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=11993, grad_norm=0.018, lr=0.0002, loss=0.196]
Train Epoch #2: 15%|ââ | 1601/10399 [17:47<1:35:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12000, grad_norm=0.0201, lr=0.0002, loss=0.197]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 12000 |
|
Train Epoch #2: 15%|ââ | 1602/10399 [17:54<2:20:08, 1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12001, grad_norm=0.0178, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1618/10399 [18:04<2:04:55, 1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12017, grad_norm=0.0184, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1634/10399 [18:15<1:55:11, 1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12033, grad_norm=0.0152, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1650/10399 [18:25<1:48:42, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12049, grad_norm=0.0186, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1666/10399 [18:36<1:44:19, 1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12065, grad_norm=0.0179, lr=0.0002, loss=0.196]
Train Epoch #2: 16%|ââ | 1682/10399 [18:46<1:41:15, 1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12081, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1698/10399 [18:57<1:39:31, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12097, grad_norm=0.0212, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1713/10399 [19:07<1:39:21, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12112, grad_norm=0.0167, lr=0.0002, loss=0.197]
Train Epoch #2: 16%|ââ | 1714/10399 [19:07<1:37:51, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12113, grad_norm=0.0165, lr=0.0002, loss=0.197]
Train Epoch #2: 17%|ââ | 1730/10399 [19:17<1:36:39, 1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12129, grad_norm=0.0179, lr=0.0002, loss=0.197]
Train Epoch #2: 17%|ââ | 1746/10399 [19:28<1:35:46, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12145, grad_norm=0.0198, lr=0.0002, loss=0.197]
Train Epoch #2: 17%|ââ | 1762/10399 [19:38<1:35:07, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12161, grad_norm=0.021, lr=0.0002, loss=0.197]
Train Epoch #2: 17%|ââ | 1778/10399 [19:49<1:34:35, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12177, grad_norm=0.0166, lr=0.0002, loss=0.197]
Train Epoch #2: 17%|ââ | 1794/10399 [19:59<1:34:10, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12193, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2: 17%|ââ | 1810/10399 [20:10<1:33:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12209, grad_norm=0.0201, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1826/10399 [20:20<1:33:31, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12225, grad_norm=0.02, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1842/10399 [20:31<1:33:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12241, grad_norm=0.0181, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1858/10399 [20:41<1:33:02, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12257, grad_norm=0.0183, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1873/10399 [20:51<1:32:52, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12272, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1874/10399 [20:51<1:32:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12273, grad_norm=0.0172, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1890/10399 [21:02<1:32:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12289, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1906/10399 [21:12<1:32:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12305, grad_norm=0.0172, lr=0.0002, loss=0.197]
Train Epoch #2: 18%|ââ | 1922/10399 [21:23<1:32:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12321, grad_norm=0.0209, lr=0.0002, loss=0.197]
Train Epoch #2: 19%|ââ | 1938/10399 [21:33<1:32:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12337, grad_norm=0.0195, lr=0.0002, loss=0.197]
Train Epoch #2: 19%|ââ | 1954/10399 [21:44<1:31:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12353, grad_norm=0.0172, lr=0.0002, loss=0.197]
Train Epoch #2: 19%|ââ | 1970/10399 [21:54<1:31:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12369, grad_norm=0.0184, lr=0.0002, loss=0.197]
Train Epoch #2: 19%|ââ | 1986/10399 [22:05<1:31:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12385, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2: 19%|ââ | 2002/10399 [22:15<1:31:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12401, grad_norm=0.0189, lr=0.0002, loss=0.197]
Train Epoch #2: 19%|ââ | 2018/10399 [22:25<1:31:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12417, grad_norm=0.0185, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2034/10399 [22:36<1:31:02, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12433, grad_norm=0.02, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2050/10399 [22:46<1:30:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12449, grad_norm=0.0192, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2065/10399 [22:57<1:30:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12464, grad_norm=0.0212, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2066/10399 [22:57<1:31:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12465, grad_norm=0.0152, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2082/10399 [23:07<1:30:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12481, grad_norm=0.0196, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2098/10399 [23:18<1:30:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12497, grad_norm=0.0157, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2114/10399 [23:28<1:30:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12513, grad_norm=0.0178, lr=0.0002, loss=0.197]
Train Epoch #2: 20%|ââ | 2130/10399 [23:39<1:30:02, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12529, grad_norm=0.0173, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|ââ | 2146/10399 [23:49<1:29:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12545, grad_norm=0.0175, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|ââ | 2162/10399 [24:00<1:29:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12561, grad_norm=0.0167, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|ââ | 2178/10399 [24:10<1:29:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12577, grad_norm=0.0173, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|ââ | 2194/10399 [24:20<1:29:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12593, grad_norm=0.0177, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|âââ | 2210/10399 [24:31<1:29:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12609, grad_norm=0.0159, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|âââ | 2225/10399 [24:41<1:29:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12624, grad_norm=0.0187, lr=0.0002, loss=0.197]
Train Epoch #2: 21%|âââ | 2226/10399 [24:41<1:28:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12625, grad_norm=0.0168, lr=0.0002, loss=0.197]
Train Epoch #2: 22%|âââ | 2242/10399 [24:52<1:28:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12641, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 22%|âââ | 2258/10399 [25:02<1:28:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12657, grad_norm=0.0179, lr=0.0002, loss=0.196]
Train Epoch #2: 22%|âââ | 2274/10399 [25:13<1:28:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12673, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2: 22%|âââ | 2290/10399 [25:23<1:28:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12689, grad_norm=0.0182, lr=0.0002, loss=0.196]
Train Epoch #2: 22%|âââ | 2306/10399 [25:34<1:28:06, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12705, grad_norm=0.0172, lr=0.0002, loss=0.196]
Train Epoch #2: 22%|âââ | 2322/10399 [25:44<1:27:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12721, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2: 22%|âââ | 2338/10399 [25:55<1:27:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12737, grad_norm=0.0177, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2354/10399 [26:05<1:27:33, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12753, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2370/10399 [26:15<1:27:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12769, grad_norm=0.0199, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2386/10399 [26:26<1:27:10, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12785, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2402/10399 [26:36<1:26:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12801, grad_norm=0.0216, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2418/10399 [26:47<1:26:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12817, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2433/10399 [26:57<1:26:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12832, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 23%|âââ | 2434/10399 [26:57<1:26:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12833, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2450/10399 [27:08<1:26:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12849, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2466/10399 [27:18<1:26:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12865, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2482/10399 [27:29<1:26:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12881, grad_norm=0.0173, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2498/10399 [27:39<1:26:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12897, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2514/10399 [27:50<1:26:02, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12913, grad_norm=0.017, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2530/10399 [28:00<1:25:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12929, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 24%|âââ | 2546/10399 [28:10<1:25:34, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12945, grad_norm=0.0185, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2562/10399 [28:21<1:25:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12961, grad_norm=0.0168, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2577/10399 [28:31<1:25:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12976, grad_norm=0.0178, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2578/10399 [28:31<1:25:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12977, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2594/10399 [28:42<1:24:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=12993, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2601/10399 [28:57<1:24:52, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13000, grad_norm=0.0178, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 13000 |
|
Train Epoch #2: 25%|âââ | 2602/10399 [29:01<2:04:09, 1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13001, grad_norm=0.0171, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2617/10399 [29:11<2:03:54, 1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13016, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2618/10399 [29:11<1:50:40, 1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13017, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2634/10399 [29:22<1:42:00, 1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13033, grad_norm=0.017, lr=0.0002, loss=0.196]
Train Epoch #2: 25%|âââ | 2650/10399 [29:32<1:36:15, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13049, grad_norm=0.0162, lr=0.0002, loss=0.196]
Train Epoch #2: 26%|âââ | 2666/10399 [29:42<1:32:21, 1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13065, grad_norm=0.017, lr=0.0002, loss=0.196]
Train Epoch #2: 26%|âââ | 2682/10399 [29:53<1:29:39, 1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13081, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 26%|âââ | 2698/10399 [30:03<1:27:43, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13097, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2: 26%|âââ | 2714/10399 [30:14<1:26:19, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13113, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 26%|âââ | 2730/10399 [30:24<1:25:20, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13129, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2: 26%|âââ | 2746/10399 [30:35<1:24:35, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13145, grad_norm=0.0159, lr=0.0002, loss=0.196]
Train Epoch #2: 27%|âââ | 2762/10399 [30:45<1:24:00, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13161, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2: 27%|âââ | 2778/10399 [30:56<1:23:34, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13177, grad_norm=0.0172, lr=0.0002, loss=0.196]
Train Epoch #2: 27%|âââ | 2794/10399 [31:06<1:23:12, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13193, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2: 27%|âââ | 2810/10399 [31:16<1:22:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13209, grad_norm=0.0188, lr=0.0002, loss=0.197]
Train Epoch #2: 27%|âââ | 2826/10399 [31:27<1:22:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13225, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 27%|âââ | 2841/10399 [31:37<1:22:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13240, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 27%|âââ | 2842/10399 [31:37<1:22:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13241, grad_norm=0.0176, lr=0.0002, loss=0.196]
Train Epoch #2: 27%|âââ | 2858/10399 [31:48<1:22:10, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13257, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 28%|âââ | 2874/10399 [31:58<1:21:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13273, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2: 28%|âââ | 2890/10399 [32:09<1:22:06, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13289, grad_norm=0.0175, lr=0.0002, loss=0.196]
Train Epoch #2: 28%|âââ | 2906/10399 [32:19<1:21:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13305, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 28%|âââ | 2922/10399 [32:30<1:21:33, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13321, grad_norm=0.0174, lr=0.0002, loss=0.196]
Train Epoch #2: 28%|âââ | 2938/10399 [32:40<1:21:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13337, grad_norm=0.0162, lr=0.0002, loss=0.196]
Train Epoch #2: 28%|âââ | 2954/10399 [32:51<1:21:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13353, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 2969/10399 [33:01<1:20:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13368, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 2970/10399 [33:01<1:20:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13369, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 2986/10399 [33:12<1:20:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13385, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 3002/10399 [33:22<1:20:33, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13401, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 3018/10399 [33:32<1:20:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13417, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 3034/10399 [33:43<1:20:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13433, grad_norm=0.015, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 3050/10399 [33:53<1:19:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13449, grad_norm=0.0163, lr=0.0002, loss=0.196]
Train Epoch #2: 29%|âââ | 3066/10399 [34:04<1:19:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13465, grad_norm=0.0173, lr=0.0002, loss=0.196]
Train Epoch #2: 30%|âââ | 3082/10399 [34:14<1:19:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13481, grad_norm=0.0183, lr=0.0002, loss=0.196]
Train Epoch #2: 30%|âââ | 3098/10399 [34:25<1:19:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13497, grad_norm=0.0174, lr=0.0002, loss=0.196]
Train Epoch #2: 30%|âââ | 3114/10399 [34:35<1:19:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13513, grad_norm=0.0161, lr=0.0002, loss=0.196]
Train Epoch #2: 30%|âââ | 3130/10399 [34:46<1:19:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13529, grad_norm=0.0159, lr=0.0002, loss=0.196]
Train Epoch #2: 30%|âââ | 3146/10399 [34:56<1:18:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13545, grad_norm=0.0164, lr=0.0002, loss=0.196]
Train Epoch #2: 30%|âââ | 3162/10399 [35:06<1:18:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13561, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|âââ | 3178/10399 [35:17<1:18:34, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13577, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|âââ | 3193/10399 [35:27<1:18:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13592, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|âââ | 3194/10399 [35:27<1:18:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13593, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|âââ | 3210/10399 [35:38<1:18:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13609, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|âââ | 3226/10399 [35:48<1:18:05, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13625, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|âââ | 3242/10399 [35:59<1:18:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13641, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|ââââ | 3258/10399 [36:09<1:17:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13657, grad_norm=0.017, lr=0.0002, loss=0.196]
Train Epoch #2: 31%|ââââ | 3274/10399 [36:20<1:17:41, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13673, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3290/10399 [36:30<1:17:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13689, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3306/10399 [36:41<1:17:18, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13705, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3321/10399 [36:51<1:17:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13720, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3322/10399 [36:51<1:17:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13721, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3338/10399 [37:02<1:16:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13737, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3354/10399 [37:12<1:16:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13753, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 32%|ââââ | 3370/10399 [37:22<1:16:33, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13769, grad_norm=0.015, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3386/10399 [37:33<1:16:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13785, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3402/10399 [37:43<1:16:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13801, grad_norm=0.0184, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3418/10399 [37:54<1:15:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13817, grad_norm=0.0178, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3434/10399 [38:04<1:15:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13833, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3450/10399 [38:15<1:15:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13849, grad_norm=0.0181, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3466/10399 [38:25<1:15:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13865, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 33%|ââââ | 3482/10399 [38:36<1:15:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13881, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3498/10399 [38:46<1:15:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13897, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3514/10399 [38:56<1:14:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13913, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3530/10399 [39:07<1:14:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13929, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3545/10399 [39:17<1:14:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13944, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3546/10399 [39:17<1:14:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13945, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3562/10399 [39:28<1:14:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13961, grad_norm=0.015, lr=0.0002, loss=0.196]
Train Epoch #2: 34%|ââââ | 3578/10399 [39:38<1:14:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13977, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3594/10399 [39:49<1:14:02, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=13993, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3601/10399 [40:01<1:13:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14000, grad_norm=0.0145, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 14000 |
|
Train Epoch #2: 35%|ââââ | 3602/10399 [40:08<1:48:24, 1.04it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14001, grad_norm=0.0157, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3618/10399 [40:18<1:36:35, 1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14017, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3634/10399 [40:29<1:29:18, 1.26it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14033, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3650/10399 [40:39<1:24:07, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14049, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3666/10399 [40:50<1:20:35, 1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14065, grad_norm=0.015, lr=0.0002, loss=0.196]
Train Epoch #2: 35%|ââââ | 3682/10399 [41:00<1:18:10, 1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14081, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3698/10399 [41:11<1:16:26, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14097, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3714/10399 [41:21<1:15:11, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14113, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3729/10399 [41:31<1:15:01, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14128, grad_norm=0.0163, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3730/10399 [41:31<1:14:16, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14129, grad_norm=0.0173, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3746/10399 [41:42<1:13:35, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14145, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3762/10399 [41:52<1:13:02, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14161, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3778/10399 [42:03<1:12:37, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14177, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2: 36%|ââââ | 3794/10399 [42:13<1:12:16, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14193, grad_norm=0.0163, lr=0.0002, loss=0.196]
Train Epoch #2: 37%|ââââ | 3810/10399 [42:24<1:11:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14209, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 37%|ââââ | 3826/10399 [42:34<1:11:43, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14225, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 37%|ââââ | 3842/10399 [42:45<1:11:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14241, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 37%|ââââ | 3858/10399 [42:55<1:11:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14257, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 37%|ââââ | 3874/10399 [43:05<1:11:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14273, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2: 37%|ââââ | 3890/10399 [43:16<1:10:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14289, grad_norm=0.0157, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3906/10399 [43:26<1:10:41, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14305, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3922/10399 [43:37<1:10:31, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14321, grad_norm=0.0156, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3937/10399 [43:47<1:10:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14336, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3938/10399 [43:47<1:10:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14337, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3954/10399 [43:58<1:10:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14353, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3970/10399 [44:08<1:09:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14369, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 3986/10399 [44:19<1:09:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14385, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 38%|ââââ | 4002/10399 [44:29<1:09:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14401, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4018/10399 [44:39<1:09:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14417, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4034/10399 [44:50<1:09:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14433, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4050/10399 [45:01<1:09:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14449, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4066/10399 [45:11<1:09:06, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14465, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4081/10399 [45:21<1:08:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14480, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4082/10399 [45:21<1:08:52, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14481, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 39%|ââââ | 4098/10399 [45:32<1:08:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14497, grad_norm=0.0149, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4114/10399 [45:42<1:08:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14513, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4130/10399 [45:53<1:08:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14529, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4146/10399 [46:03<1:08:06, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14545, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4162/10399 [46:14<1:07:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14561, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4178/10399 [46:24<1:07:43, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14577, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4194/10399 [46:35<1:07:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14593, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 40%|ââââ | 4210/10399 [46:45<1:07:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14609, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|ââââ | 4226/10399 [46:55<1:07:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14625, grad_norm=0.0172, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|ââââ | 4242/10399 [47:06<1:07:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14641, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|ââââ | 4258/10399 [47:16<1:06:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14657, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|ââââ | 4274/10399 [47:27<1:06:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14673, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|ââââ | 4289/10399 [47:37<1:06:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14688, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|âââââ | 4290/10399 [47:37<1:06:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14689, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 41%|âââââ | 4306/10399 [47:48<1:06:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14705, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4322/10399 [47:58<1:06:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14721, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4338/10399 [48:09<1:05:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14737, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4354/10399 [48:19<1:05:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14753, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4370/10399 [48:29<1:05:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14769, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4386/10399 [48:40<1:05:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14785, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4402/10399 [48:50<1:05:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14801, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 42%|âââââ | 4418/10399 [49:01<1:05:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14817, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4433/10399 [49:11<1:05:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14832, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4434/10399 [49:11<1:05:05, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14833, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4450/10399 [49:22<1:04:52, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14849, grad_norm=0.0174, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4466/10399 [49:32<1:04:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14865, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4482/10399 [49:43<1:04:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14881, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4498/10399 [49:53<1:04:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14897, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 43%|âââââ | 4514/10399 [50:04<1:04:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14913, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4530/10399 [50:14<1:03:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14929, grad_norm=0.015, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4546/10399 [50:25<1:03:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14945, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4562/10399 [50:35<1:03:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14961, grad_norm=0.0167, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4578/10399 [50:45<1:03:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14977, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4594/10399 [50:56<1:03:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=14993, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4601/10399 [51:07<1:03:06, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15000, grad_norm=0.0133, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 15000 |
|
Train Epoch #2: 44%|âââââ | 4602/10399 [51:16<1:34:30, 1.02it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15001, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 44%|âââââ | 4618/10399 [51:26<1:23:42, 1.15it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15017, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4634/10399 [51:37<1:16:44, 1.25it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15033, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4649/10399 [51:47<1:16:32, 1.25it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15048, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4650/10399 [51:47<1:12:06, 1.33it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15049, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4666/10399 [51:58<1:08:54, 1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15065, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4682/10399 [52:08<1:06:42, 1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15081, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4698/10399 [52:18<1:05:08, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15097, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4714/10399 [52:29<1:04:01, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15113, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 45%|âââââ | 4730/10399 [52:39<1:03:11, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15129, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4746/10399 [52:50<1:02:33, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15145, grad_norm=0.015, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4762/10399 [53:00<1:02:04, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15161, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4778/10399 [53:11<1:01:40, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15177, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4793/10399 [53:21<1:01:30, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15192, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4794/10399 [53:21<1:01:36, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15193, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4810/10399 [53:32<1:01:14, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15209, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 46%|âââââ | 4826/10399 [53:42<1:00:54, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15225, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4842/10399 [53:53<1:00:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15241, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4858/10399 [54:03<1:00:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15257, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4874/10399 [54:14<1:00:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15273, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4890/10399 [54:24<1:00:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15289, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4906/10399 [54:34<59:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15305, grad_norm=0.0142, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4922/10399 [54:45<59:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15321, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2: 47%|âââââ | 4938/10399 [54:55<59:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15337, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 4954/10399 [55:06<59:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15353, grad_norm=0.0149, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 4970/10399 [55:16<59:03, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15369, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 4986/10399 [55:27<58:52, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15385, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 5002/10399 [55:37<58:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15401, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 5017/10399 [55:47<58:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15416, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 5018/10399 [55:48<58:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15417, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 48%|âââââ | 5034/10399 [55:58<58:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15433, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5050/10399 [56:08<58:10, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15449, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5066/10399 [56:19<58:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15465, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5082/10399 [56:29<57:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15481, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5098/10399 [56:40<57:41, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15497, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5114/10399 [56:50<57:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15513, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5130/10399 [57:01<57:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15529, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 49%|âââââ | 5146/10399 [57:11<57:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15545, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5161/10399 [57:21<56:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15560, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5162/10399 [57:21<56:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15561, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5178/10399 [57:32<56:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15577, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5194/10399 [57:42<56:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15593, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5210/10399 [57:53<56:41, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15609, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5226/10399 [58:03<56:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15625, grad_norm=0.0152, lr=0.0002, loss=0.196]
Train Epoch #2: 50%|âââââ | 5242/10399 [58:14<56:13, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15641, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|âââââ | 5258/10399 [58:24<56:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15657, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|âââââ | 5274/10399 [58:35<55:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15673, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|âââââ | 5290/10399 [58:45<55:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15689, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|âââââ | 5306/10399 [58:56<55:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15705, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|âââââ | 5322/10399 [59:06<55:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15721, grad_norm=0.0169, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|ââââââ | 5338/10399 [59:17<55:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15737, grad_norm=0.0151, lr=0.0002, loss=0.196]
Train Epoch #2: 51%|ââââââ | 5354/10399 [59:27<54:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15753, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5369/10399 [59:37<54:43, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15768, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5370/10399 [59:37<54:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15769, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5386/10399 [59:48<54:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15785, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5402/10399 [59:58<54:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15801, grad_norm=0.0162, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5418/10399 [1:00:09<54:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15817, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5434/10399 [1:00:19<53:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15833, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2: 52%|ââââââ | 5450/10399 [1:00:30<53:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15849, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5466/10399 [1:00:40<53:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15865, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5482/10399 [1:00:51<53:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15881, grad_norm=0.0166, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5498/10399 [1:01:01<53:18, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15897, grad_norm=0.0124, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5513/10399 [1:01:11<53:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15912, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5514/10399 [1:01:11<53:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15913, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5530/10399 [1:01:22<52:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15929, grad_norm=0.0122, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5546/10399 [1:01:32<52:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15945, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2: 53%|ââââââ | 5562/10399 [1:01:43<52:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15961, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5578/10399 [1:01:53<52:41, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15977, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5594/10399 [1:02:04<52:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=15993, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5601/10399 [1:02:17<52:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16000, grad_norm=0.0135, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 16000 |
|
Train Epoch #2: 54%|ââââââ | 5602/10399 [1:02:23<1:17:11, 1.04it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16001, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5618/10399 [1:02:34<1:08:31, 1.16it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16017, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5634/10399 [1:02:44<1:02:57, 1.26it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16033, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5650/10399 [1:02:54<59:14, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16049, grad_norm=0.0126, lr=0.0002, loss=0.196]
Train Epoch #2: 54%|ââââââ | 5666/10399 [1:03:05<56:41, 1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16065, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5682/10399 [1:03:15<54:53, 1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16081, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5698/10399 [1:03:26<53:36, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16097, grad_norm=0.0147, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5714/10399 [1:03:36<52:41, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16113, grad_norm=0.0125, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5730/10399 [1:03:47<51:59, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16129, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5746/10399 [1:03:57<51:27, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16145, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5761/10399 [1:04:07<51:17, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16160, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 55%|ââââââ | 5762/10399 [1:04:07<51:00, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16161, grad_norm=0.0146, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5778/10399 [1:04:18<50:40, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16177, grad_norm=0.017, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5794/10399 [1:04:28<50:23, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16193, grad_norm=0.0154, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5810/10399 [1:04:39<50:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16209, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5826/10399 [1:04:49<49:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16225, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5842/10399 [1:05:00<49:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16241, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5858/10399 [1:05:10<49:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16257, grad_norm=0.0165, lr=0.0002, loss=0.196]
Train Epoch #2: 56%|ââââââ | 5874/10399 [1:05:21<49:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16273, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5890/10399 [1:05:31<49:05, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16289, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5905/10399 [1:05:41<48:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16304, grad_norm=0.0148, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5906/10399 [1:05:41<48:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16305, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5922/10399 [1:05:52<48:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16321, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5938/10399 [1:06:03<48:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16337, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5954/10399 [1:06:13<48:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16353, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 57%|ââââââ | 5970/10399 [1:06:23<48:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16369, grad_norm=0.0126, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 5986/10399 [1:06:34<48:03, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16385, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 6002/10399 [1:06:44<47:51, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16401, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 6018/10399 [1:06:55<47:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16417, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 6034/10399 [1:07:05<47:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16433, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 6050/10399 [1:07:16<47:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16449, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 6066/10399 [1:07:26<47:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16465, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 58%|ââââââ | 6082/10399 [1:07:37<46:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16481, grad_norm=0.0122, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6098/10399 [1:07:47<46:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16497, grad_norm=0.0135, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6113/10399 [1:07:57<46:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16512, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6114/10399 [1:07:57<46:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16513, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6130/10399 [1:08:08<46:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16529, grad_norm=0.012, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6146/10399 [1:08:18<46:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16545, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6162/10399 [1:08:29<46:06, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16561, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2: 59%|ââââââ | 6178/10399 [1:08:39<45:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16577, grad_norm=0.0123, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6194/10399 [1:08:50<45:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16593, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6210/10399 [1:09:00<45:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16609, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6226/10399 [1:09:11<45:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16625, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6242/10399 [1:09:21<45:14, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16641, grad_norm=0.0143, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6257/10399 [1:09:31<45:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16656, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6258/10399 [1:09:31<45:04, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16657, grad_norm=0.0134, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6274/10399 [1:09:42<44:53, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16673, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2: 60%|ââââââ | 6290/10399 [1:09:52<44:41, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16689, grad_norm=0.0131, lr=0.0002, loss=0.196]
Train Epoch #2: 61%|ââââââ | 6306/10399 [1:10:03<44:31, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16705, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 61%|ââââââ | 6322/10399 [1:10:13<44:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16721, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2: 61%|ââââââ | 6338/10399 [1:10:24<44:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16737, grad_norm=0.016, lr=0.0002, loss=0.196]
Train Epoch #2: 61%|ââââââ | 6354/10399 [1:10:34<44:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16753, grad_norm=0.0155, lr=0.0002, loss=0.196]
Train Epoch #2: 61%|âââââââ | 6370/10399 [1:10:45<43:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16769, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2: 61%|âââââââ | 6386/10399 [1:10:55<43:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16785, grad_norm=0.0132, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6402/10399 [1:11:06<43:33, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16801, grad_norm=0.0158, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6418/10399 [1:11:16<43:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16817, grad_norm=0.013, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6434/10399 [1:11:26<43:10, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16833, grad_norm=0.0136, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6450/10399 [1:11:37<42:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16849, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6465/10399 [1:11:47<42:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16864, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6466/10399 [1:11:47<42:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16865, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6482/10399 [1:11:58<42:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16881, grad_norm=0.0138, lr=0.0002, loss=0.196]
Train Epoch #2: 62%|âââââââ | 6498/10399 [1:12:08<42:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16897, grad_norm=0.0137, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6514/10399 [1:12:19<42:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16913, grad_norm=0.0122, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6530/10399 [1:12:29<42:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16929, grad_norm=0.0153, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6546/10399 [1:12:40<41:57, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16945, grad_norm=0.0118, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6562/10399 [1:12:50<41:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16961, grad_norm=0.0126, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6578/10399 [1:13:01<41:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16977, grad_norm=0.0121, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6594/10399 [1:13:11<41:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=16993, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2: 63%|âââââââ | 6601/10399 [1:13:21<41:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17000, grad_norm=0.0142, lr=0.0002, loss=0.196]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 17000 |
|
Train Epoch #2: 63%|âââââââ | 6602/10399 [1:13:30<1:00:07, 1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17001, grad_norm=0.0128, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6618/10399 [1:13:40<53:34, 1.18it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17017, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6634/10399 [1:13:51<49:18, 1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17033, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6650/10399 [1:14:01<46:28, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17049, grad_norm=0.0139, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6665/10399 [1:14:11<46:17, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17064, grad_norm=0.0129, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6666/10399 [1:14:11<44:31, 1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17065, grad_norm=0.0145, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6682/10399 [1:14:22<43:07, 1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17081, grad_norm=0.0127, lr=0.0002, loss=0.196]
Train Epoch #2: 64%|âââââââ | 6698/10399 [1:14:32<42:07, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17097, grad_norm=0.0133, lr=0.0002, loss=0.196]
Train Epoch #2: 65%|âââââââ | 6714/10399 [1:14:43<41:32, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17113, grad_norm=0.0144, lr=0.0002, loss=0.196]
Train Epoch #2: 65%|âââââââ | 6730/10399 [1:14:53<40:55, 1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17129, grad_norm=0.0141, lr=0.0002, loss=0.196]
Train Epoch #2: 65%|âââââââ | 6746/10399 [1:15:04<40:26, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17145, grad_norm=0.014, lr=0.0002, loss=0.196]
Train Epoch #2: 65%|âââââââ | 6762/10399 [1:15:14<40:03, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17161, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2: 65%|âââââââ | 6778/10399 [1:15:25<39:43, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17177, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 65%|âââââââ | 6794/10399 [1:15:35<39:24, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17193, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 65%|âââââââ | 6810/10399 [1:15:45<39:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17209, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6826/10399 [1:15:56<38:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17225, grad_norm=0.0146, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6842/10399 [1:16:06<38:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17241, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6858/10399 [1:16:17<38:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17257, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6873/10399 [1:16:27<38:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17272, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6874/10399 [1:16:27<38:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17273, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6890/10399 [1:16:38<38:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17289, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 66%|âââââââ | 6906/10399 [1:16:48<38:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17305, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 6922/10399 [1:16:59<37:47, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17321, grad_norm=0.0141, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 6938/10399 [1:17:09<37:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17337, grad_norm=0.011, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 6954/10399 [1:17:19<37:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17353, grad_norm=0.0138, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 6970/10399 [1:17:30<37:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17369, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 6986/10399 [1:17:40<37:05, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17385, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 7002/10399 [1:17:51<36:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17401, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 67%|âââââââ | 7018/10399 [1:18:01<36:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17417, grad_norm=0.0153, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7033/10399 [1:18:11<36:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17432, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7034/10399 [1:18:12<36:33, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17433, grad_norm=0.0182, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7050/10399 [1:18:22<36:23, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17449, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7066/10399 [1:18:32<36:11, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17465, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7082/10399 [1:18:43<36:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17481, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7098/10399 [1:18:53<35:54, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17497, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 68%|âââââââ | 7114/10399 [1:19:04<35:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17513, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7130/10399 [1:19:14<35:30, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17529, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7146/10399 [1:19:25<35:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17545, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7162/10399 [1:19:35<35:08, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17561, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7178/10399 [1:19:45<34:58, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17577, grad_norm=0.0166, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7194/10399 [1:19:56<34:46, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17593, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7210/10399 [1:20:06<34:35, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17609, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 69%|âââââââ | 7226/10399 [1:20:17<34:26, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17625, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7242/10399 [1:20:27<34:14, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17641, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7257/10399 [1:20:37<34:04, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17656, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7258/10399 [1:20:37<34:03, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17657, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7274/10399 [1:20:48<33:53, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17673, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7290/10399 [1:20:58<33:42, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17689, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7306/10399 [1:21:09<33:31, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17705, grad_norm=0.0129, lr=0.0002, loss=0.195]
Train Epoch #2: 70%|âââââââ | 7322/10399 [1:21:19<33:21, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17721, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|âââââââ | 7338/10399 [1:21:30<33:11, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17737, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|âââââââ | 7354/10399 [1:21:40<33:01, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17753, grad_norm=0.0116, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|âââââââ | 7370/10399 [1:21:50<32:51, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17769, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|âââââââ | 7386/10399 [1:22:01<32:41, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17785, grad_norm=0.0114, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|âââââââ | 7401/10399 [1:22:11<32:31, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17800, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|âââââââ | 7402/10399 [1:22:11<32:30, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17801, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|ââââââââ | 7418/10399 [1:22:22<32:19, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17817, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 71%|ââââââââ | 7434/10399 [1:22:32<32:09, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17833, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2: 72%|ââââââââ | 7450/10399 [1:22:42<31:59, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17849, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2: 72%|ââââââââ | 7466/10399 [1:22:53<31:49, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17865, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2: 72%|ââââââââ | 7482/10399 [1:23:03<31:38, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17881, grad_norm=0.0167, lr=0.0002, loss=0.195]
Train Epoch #2: 72%|ââââââââ | 7498/10399 [1:23:14<31:27, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17897, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2: 72%|ââââââââ | 7514/10399 [1:23:24<31:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17913, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2: 72%|ââââââââ | 7530/10399 [1:23:35<31:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17929, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7546/10399 [1:23:45<31:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17945, grad_norm=0.0154, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7562/10399 [1:23:55<30:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17961, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7578/10399 [1:24:06<30:37, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17977, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7594/10399 [1:24:16<30:27, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=17993, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7601/10399 [1:24:27<30:22, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18000, grad_norm=0.0142, lr=0.0002, loss=0.195]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 18000 |
|
Train Epoch #2: 73%|ââââââââ | 7602/10399 [1:24:35<44:11, 1.05it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18001, grad_norm=0.0129, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7618/10399 [1:24:45<39:17, 1.18it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18017, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2: 73%|ââââââââ | 7634/10399 [1:24:56<36:07, 1.28it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18033, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7650/10399 [1:25:06<33:58, 1.35it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18049, grad_norm=0.0114, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7666/10399 [1:25:17<32:30, 1.40it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18065, grad_norm=0.0156, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7682/10399 [1:25:27<31:26, 1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18081, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7697/10399 [1:25:37<31:16, 1.44it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18096, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7698/10399 [1:25:37<30:40, 1.47it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18097, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7714/10399 [1:25:48<30:06, 1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18113, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7730/10399 [1:25:58<29:38, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18129, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2: 74%|ââââââââ | 7746/10399 [1:26:09<29:15, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18145, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 75%|ââââââââ | 7762/10399 [1:26:19<28:55, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18161, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 75%|ââââââââ | 7778/10399 [1:26:30<28:40, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18177, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 75%|ââââââââ | 7794/10399 [1:26:40<28:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18193, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 75%|ââââââââ | 7810/10399 [1:26:50<28:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18209, grad_norm=0.0138, lr=0.0002, loss=0.195]
Train Epoch #2: 75%|ââââââââ | 7826/10399 [1:27:01<28:00, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18225, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 75%|ââââââââ | 7842/10399 [1:27:11<27:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18241, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7857/10399 [1:27:21<27:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18256, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7858/10399 [1:27:22<27:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18257, grad_norm=0.0144, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7874/10399 [1:27:32<27:26, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18273, grad_norm=0.012, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7890/10399 [1:27:42<27:15, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18289, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7906/10399 [1:27:53<27:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18305, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7922/10399 [1:28:03<26:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18321, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7938/10399 [1:28:14<26:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18337, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 76%|ââââââââ | 7954/10399 [1:28:24<26:34, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18353, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 77%|ââââââââ | 7970/10399 [1:28:35<26:22, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18369, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 77%|ââââââââ | 7986/10399 [1:28:45<26:12, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18385, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2: 77%|ââââââââ | 8002/10399 [1:28:56<26:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18401, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 77%|ââââââââ | 8018/10399 [1:29:06<25:51, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18417, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 77%|ââââââââ | 8034/10399 [1:29:16<25:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18433, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2: 77%|ââââââââ | 8050/10399 [1:29:27<25:30, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18449, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8065/10399 [1:29:37<25:20, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18464, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8066/10399 [1:29:37<25:20, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18465, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8082/10399 [1:29:48<25:09, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18481, grad_norm=0.0115, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8098/10399 [1:29:58<24:58, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18497, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8114/10399 [1:30:09<24:49, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18513, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8130/10399 [1:30:19<24:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18529, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8146/10399 [1:30:29<24:28, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18545, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 78%|ââââââââ | 8162/10399 [1:30:40<24:17, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18561, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8178/10399 [1:30:50<24:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18577, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8194/10399 [1:31:01<23:56, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18593, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8210/10399 [1:31:11<23:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18609, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8225/10399 [1:31:21<23:36, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18624, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8226/10399 [1:31:21<23:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18625, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8242/10399 [1:31:32<23:24, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18641, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 79%|ââââââââ | 8258/10399 [1:31:42<23:19, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18657, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8274/10399 [1:31:53<23:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18673, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8290/10399 [1:32:03<22:55, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18689, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8306/10399 [1:32:14<22:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18705, grad_norm=0.012, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8322/10399 [1:32:24<22:34, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18721, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8338/10399 [1:32:35<22:23, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18737, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8354/10399 [1:32:45<22:13, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18753, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2: 80%|ââââââââ | 8370/10399 [1:32:56<22:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18769, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|ââââââââ | 8386/10399 [1:33:06<22:02, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18785, grad_norm=0.0158, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|ââââââââ | 8402/10399 [1:33:17<21:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18801, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|ââââââââ | 8418/10399 [1:33:27<21:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18817, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|ââââââââ | 8433/10399 [1:33:37<21:25, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18832, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|ââââââââ | 8434/10399 [1:33:38<21:28, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18833, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|âââââââââ | 8450/10399 [1:33:48<21:22, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18849, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 81%|âââââââââ | 8466/10399 [1:33:59<21:08, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18865, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8482/10399 [1:34:09<20:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18881, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8498/10399 [1:34:19<20:44, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18897, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8514/10399 [1:34:30<20:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18913, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8530/10399 [1:34:40<20:23, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18929, grad_norm=0.0125, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8546/10399 [1:34:51<20:19, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18945, grad_norm=0.0113, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8561/10399 [1:35:01<20:09, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18960, grad_norm=0.0135, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8562/10399 [1:35:02<20:11, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18961, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 82%|âââââââââ | 8578/10399 [1:35:12<19:57, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18977, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8594/10399 [1:35:23<19:44, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=18993, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8601/10399 [1:35:37<19:39, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19000, grad_norm=0.0128, lr=0.0002, loss=0.195]save model to .exp/diffusion/imagenet_512/dc_ae_f32c32_in_1.0/dit_xl_1/bs_1024_lr_2e-4_fp16/checkpoint.pt at step 19000 |
|
Train Epoch #2: 83%|âââââââââ | 8602/10399 [1:35:42<28:45, 1.04it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19001, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8618/10399 [1:35:52<25:26, 1.17it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19017, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8634/10399 [1:36:02<23:15, 1.27it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19033, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8650/10399 [1:36:13<21:45, 1.34it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19049, grad_norm=0.0122, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8666/10399 [1:36:23<20:46, 1.39it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19065, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 83%|âââââââââ | 8682/10399 [1:36:34<19:58, 1.43it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19081, grad_norm=0.0136, lr=0.0002, loss=0.195]
Train Epoch #2: 84%|âââââââââ | 8698/10399 [1:36:44<19:22, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19097, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 84%|âââââââââ | 8714/10399 [1:36:55<18:56, 1.48it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19113, grad_norm=0.0115, lr=0.0002, loss=0.195]
Train Epoch #2: 84%|âââââââââ | 8730/10399 [1:37:05<18:34, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19129, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2: 84%|âââââââââ | 8746/10399 [1:37:16<18:16, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19145, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 84%|âââââââââ | 8762/10399 [1:37:26<18:00, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19161, grad_norm=0.0116, lr=0.0002, loss=0.195]
Train Epoch #2: 84%|âââââââââ | 8778/10399 [1:37:36<17:45, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19177, grad_norm=0.0129, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8794/10399 [1:37:47<17:31, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19193, grad_norm=0.0132, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8809/10399 [1:37:57<17:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19208, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8810/10399 [1:37:57<17:18, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19209, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8826/10399 [1:38:08<17:07, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19225, grad_norm=0.0142, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8842/10399 [1:38:18<16:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19241, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8858/10399 [1:38:29<16:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19257, grad_norm=0.0156, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8874/10399 [1:38:39<16:34, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19273, grad_norm=0.0137, lr=0.0002, loss=0.195]
Train Epoch #2: 85%|âââââââââ | 8890/10399 [1:38:49<16:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19289, grad_norm=0.0138, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8906/10399 [1:39:00<16:13, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19305, grad_norm=0.0134, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8922/10399 [1:39:10<16:02, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19321, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8938/10399 [1:39:21<15:52, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19337, grad_norm=0.014, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8954/10399 [1:39:31<15:42, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19353, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8969/10399 [1:39:41<15:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19368, grad_norm=0.0121, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8970/10399 [1:39:42<15:32, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19369, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 86%|âââââââââ | 8986/10399 [1:39:52<15:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19385, grad_norm=0.0131, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9002/10399 [1:40:03<15:11, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19401, grad_norm=0.0126, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9018/10399 [1:40:13<15:01, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19417, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9034/10399 [1:40:23<14:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19433, grad_norm=0.0117, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9050/10399 [1:40:34<14:43, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19449, grad_norm=0.0123, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9066/10399 [1:40:45<14:37, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19465, grad_norm=0.0145, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9082/10399 [1:40:55<14:24, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19481, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2: 87%|âââââââââ | 9098/10399 [1:41:06<14:21, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19497, grad_norm=0.0143, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9113/10399 [1:41:17<14:38, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19512, grad_norm=0.0145, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9128/10399 [1:41:27<14:27, 1.46it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19527, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9129/10399 [1:41:27<14:14, 1.49it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19528, grad_norm=0.0151, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9145/10399 [1:41:38<13:55, 1.50it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19544, grad_norm=0.0128, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9161/10399 [1:41:48<13:39, 1.51it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19560, grad_norm=0.013, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9177/10399 [1:41:59<13:25, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19576, grad_norm=0.0142, lr=0.0002, loss=0.195]
Train Epoch #2: 88%|âââââââââ | 9193/10399 [1:42:09<13:11, 1.52it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19592, grad_norm=0.015, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9209/10399 [1:42:19<12:58, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19608, grad_norm=0.0117, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9225/10399 [1:42:30<12:46, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19624, grad_norm=0.012, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9241/10399 [1:42:40<12:35, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19640, grad_norm=0.0118, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9257/10399 [1:42:51<12:24, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19656, grad_norm=0.0139, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9273/10399 [1:43:01<12:13, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19672, grad_norm=0.0108, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9289/10399 [1:43:11<12:02, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19688, grad_norm=0.012, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9304/10399 [1:43:21<11:53, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19703, grad_norm=0.015, lr=0.0002, loss=0.195]
Train Epoch #2: 89%|âââââââââ | 9305/10399 [1:43:22<11:52, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19704, grad_norm=0.0124, lr=0.0002, loss=0.195]
Train Epoch #2: 90%|âââââââââ | 9321/10399 [1:43:32<11:42, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19720, grad_norm=0.0133, lr=0.0002, loss=0.195]
Train Epoch #2: 90%|âââââââââ | 9337/10399 [1:43:43<11:31, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19736, grad_norm=0.0155, lr=0.0002, loss=0.195]
Train Epoch #2: 90%|âââââââââ | 9353/10399 [1:43:53<11:20, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19752, grad_norm=0.0127, lr=0.0002, loss=0.195]
Train Epoch #2: 90%|âââââââââ | 9369/10399 [1:44:03<11:10, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19768, grad_norm=0.0113, lr=0.0002, loss=0.195]
Train Epoch #2: 90%|âââââââââ | 9385/10399 [1:44:14<11:00, 1.54it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19784, grad_norm=0.0115, lr=0.0002, loss=0.195]
Train Epoch #2: 90%|âââââââââ | 9401/10399 [1:44:24<10:50, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19800, grad_norm=0.0119, lr=0.0002, loss=0.195]
Train Epoch #2: 91%|âââââââââ | 9417/10399 [1:44:35<10:39, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19816, grad_norm=0.0127, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|âââââââââ | 9433/10399 [1:44:45<10:29, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19832, grad_norm=0.0123, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|âââââââââ | 9449/10399 [1:44:56<10:21, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19848, grad_norm=0.0122, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|âââââââââ | 9465/10399 [1:45:06<10:10, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19864, grad_norm=0.0129, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|âââââââââ | 9481/10399 [1:45:17<09:59, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19880, grad_norm=0.0117, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|ââââââââââ| 9497/10399 [1:45:27<09:48, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19896, grad_norm=0.0136, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|ââââââââââ| 9512/10399 [1:45:37<09:38, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19911, grad_norm=0.0127, lr=0.0002, loss=0.194]
Train Epoch #2: 91%|ââââââââââ| 9513/10399 [1:45:38<09:37, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19912, grad_norm=0.0119, lr=0.0002, loss=0.194]
Train Epoch #2: 92%|ââââââââââ| 9529/10399 [1:45:48<09:27, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19928, grad_norm=0.0146, lr=0.0002, loss=0.194]
Train Epoch #2: 92%|ââââââââââ| 9545/10399 [1:45:58<09:16, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=19944, grad_norm=0.0144, lr=0.0002, loss=0.194]
Train Epoch #2: 92%|ââââââââââ| 9561/10399 [1:46:09<09:06, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0125, lr=0.0002, loss=0.194]
Train Epoch #2: 92%|ââââââââââ| 9577/10399 [1:46:19<08:56, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0136, lr=0.0002, loss=0.194]
Train Epoch #2: 92%|ââââââââââ| 9593/10399 [1:46:30<08:45, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0115, lr=0.0002, loss=0.194] |
|
Valid Step #20000: 0%| | 0/1563 [00:00<?, ?it/s][A
Train Epoch #2: 92%|ââââââââââ| 9601/10399 [1:46:41<08:40, 1.53it/s, shape=torch.Size([128, 32, 16, 16]), global_step=2e+4, grad_norm=0.0127, lr=0.0002, loss=0.194] |
|
Valid Step #20000: 0%| | 1/1563 [00:17<7:33:16, 17.41s/it][A |
|
Valid Step #20000: 0%| | 2/1563 [00:33<7:07:06, 16.42s/it][A |