Ws_jnXL_v9RL / Lora_Trainer_RuntimeError.rtf
CathrineNBS's picture
Upload Lora_Trainer_RuntimeError.rtf
f250528 verified
{\rtf1\ansi\ansicpg936\cocoartf2709
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fmodern\fcharset0 Courier;\f1\fmodern\fcharset0 Courier-Oblique;\f2\fmodern\fcharset0 Courier-Bold;
}
{\colortbl;\red255\green255\blue255;\red204\green204\blue203;\red87\green193\blue255;\red134\green72\blue255;
\red252\green71\blue193;\red255\green255\blue83;}
{\*\expandedcolortbl;;\csgenericrgb\c80000\c80000\c79608;\csgenericrgb\c34118\c75686\c100000;\csgenericrgb\c52549\c28235\c100000;
\csgenericrgb\c98824\c27843\c75686;\csgenericrgb\c100000\c100000\c32549;}
\paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0
\deftab720
\pard\pardeftab720\pardirnatural\partightenfactor0
\f0\fs28 \cf0 \expnd0\expndtw0\kerning0
Traceback (most recent call last):\cf2 \
\cf0 File
\f1\i \cf3 "/content/kohya-trainer/train_network_wrapper.py"
\f0\i0 \cf0 , line \cf4 9\cf0 , in <module>\cf2 \
\cf0 train(args)\cf2 \
\cf0 File
\f1\i \cf3 "/content/kohya-trainer/train_network.py"
\f0\i0 \cf0 , line \cf4 168\cf0 , in train\cf2 \
\cf0 text_encoder, vae, unet, _ \cf5 =\cf0 train_util\cf5 .\cf0 load_target_model(args, weight_dtype, accelerator)\cf2 \
\cf0 File
\f1\i \cf3 "/content/kohya-trainer/library/train_util.py"
\f0\i0 \cf0 , line \cf4 3150\cf0 , in load_target_model\cf2 \
\cf0 text_encoder, vae, unet, load_stable_diffusion_format \cf5 =\cf0 _load_target_model(\cf2 \
\cf0 File
\f1\i \cf3 "/content/kohya-trainer/library/train_util.py"
\f0\i0 \cf0 , line \cf4 3116\cf0 , in _load_target_model\cf2 \
\cf0 text_encoder, vae, unet \cf5 =\cf0 model_util\cf5 .\cf0 load_models_from_stable_diffusion_checkpoint(args\cf5 .\cf0 v2, name_or_path, device)\cf2 \
\cf0 File
\f1\i \cf3 "/content/kohya-trainer/library/model_util.py"
\f0\i0 \cf0 , line \cf4 863\cf0 , in load_models_from_stable_diffusion_checkpoint\cf2 \
\cf0 info \cf5 =\cf0 unet\cf5 .\cf0 load_state_dict(converted_unet_checkpoint)\cf2 \
\cf0 File
\f1\i \cf3 "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py"
\f0\i0 \cf0 , line \cf4 2152\cf0 , in load_state_dict\cf2 \
\cf0 \cf5 raise\cf0 RuntimeError(\cf6 'Error(s) in loading state_dict for \{\}:\\n\\t\{\}'\cf5 .\cf0 format(\cf2 \
\
\pard\pardeftab720\pardirnatural\partightenfactor0
\f2\b \cf0 RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:\
Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight", "down_blocks.0.attentions.0.proj_in.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.0.proj_out.weight", "down_blocks.0.attentions.0.proj_out.bias", "down_blocks.0.attentions.1.norm.weight", "down_blocks.0.attentions.1.norm.bias", "down_blocks.0.attentions.1.proj_in.weight", "down_blocks.0.attentions.1.proj_in.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.1.proj_out.weight", "down_blocks.0.attentions.1.proj_out.bias", "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias", "down_blocks.3.resnets.0.norm1.weight", "down_blocks.3.resnets.0.norm1.bias", "down_blocks.3.resnets.0.conv1.weight", "down_blocks.3.resnets.0.conv1.bias", "down_blocks.3.resnets.0.time_emb_proj.weight", "down_blocks.3.resnets.0.time_emb_proj.bias", "down_blocks.3.resnets.0.norm2.weight", "down_blocks.3.resnets.0.norm2.bias", "down_blocks.3.resnets.0.conv2.weight", "down_blocks.3.resnets.0.conv2.bias", "down_blocks.3.resnets.1.norm1.weight", "down_blocks.3.resnets.1.norm1.bias", "down_blocks.3.resnets.1.conv1.weight", "down_blocks.3.resnets.1.conv1.bias", "down_blocks.3.resnets.1.time_emb_proj.weight", "down_blocks.3.resnets.1.time_emb_proj.bias", "down_blocks.3.resnets.1.norm2.weight", "down_blocks.3.resnets.1.norm2.bias", "down_blocks.3.resnets.1.conv2.weight", "down_blocks.3.resnets.1.conv2.bias", "up_blocks.2.attentions.0.norm.weight", "up_blocks.2.attentions.0.norm.bias", "up_blocks.2.attentions.0.proj_in.weight", "up_blocks.2.attentions.0.proj_in.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.0.proj_out.weight", "up_blocks.2.attentions.0.proj_out.bias", "up_blocks.2.attentions.1.norm.weight", "up_blocks.2.attentions.1.norm.bias", "up_blocks.2.attentions.1.proj_in.weight", "up_blocks.2.attentions.1.proj_in.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.1.proj_out.weight", "up_blocks.2.attentions.1.proj_out.bias", "up_blocks.2.attentions.2.norm.weight", "up_blocks.2.attentions.2.norm.bias", "up_blocks.2.attentions.2.proj_in.weight", "up_blocks.2.attentions.2.proj_in.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.2.proj_out.weight", "up_blocks.2.attentions.2.proj_out.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias", "up_blocks.3.attentions.0.norm.weight", "up_blocks.3.attentions.0.norm.bias", "up_blocks.3.attentions.0.proj_in.weight", "up_blocks.3.attentions.0.proj_in.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.0.proj_out.weight", "up_blocks.3.attentions.0.proj_out.bias", "up_blocks.3.attentions.1.norm.weight", "up_blocks.3.attentions.1.norm.bias", "up_blocks.3.attentions.1.proj_in.weight", "up_blocks.3.attentions.1.proj_in.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.1.proj_out.weight", "up_blocks.3.attentions.1.proj_out.bias", "up_blocks.3.attentions.2.norm.weight", "up_blocks.3.attentions.2.norm.bias", "up_blocks.3.attentions.2.proj_in.weight", "up_blocks.3.attentions.2.proj_in.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.2.proj_out.weight", "up_blocks.3.attentions.2.proj_out.bias", "up_blocks.3.resnets.0.norm1.weight", "up_blocks.3.resnets.0.norm1.bias", "up_blocks.3.resnets.0.conv1.weight", "up_blocks.3.resnets.0.conv1.bias", "up_blocks.3.resnets.0.time_emb_proj.weight", "up_blocks.3.resnets.0.time_emb_proj.bias", "up_blocks.3.resnets.0.norm2.weight", "up_blocks.3.resnets.0.norm2.bias", "up_blocks.3.resnets.0.conv2.weight", "up_blocks.3.resnets.0.conv2.bias", "up_blocks.3.resnets.0.conv_shortcut.weight", "up_blocks.3.resnets.0.conv_shortcut.bias", "up_blocks.3.resnets.1.norm1.weight", "up_blocks.3.resnets.1.norm1.bias", "up_blocks.3.resnets.1.conv1.weight", "up_blocks.3.resnets.1.conv1.bias", "up_blocks.3.resnets.1.time_emb_proj.weight", "up_blocks.3.resnets.1.time_emb_proj.bias", "up_blocks.3.resnets.1.norm2.weight", "up_blocks.3.resnets.1.norm2.bias", "up_blocks.3.resnets.1.conv2.weight", "up_blocks.3.resnets.1.conv2.bias", "up_blocks.3.resnets.1.conv_shortcut.weight", "up_blocks.3.resnets.1.conv_shortcut.bias", "up_blocks.3.resnets.2.norm1.weight", "up_blocks.3.resnets.2.norm1.bias", "up_blocks.3.resnets.2.conv1.weight", "up_blocks.3.resnets.2.conv1.bias", "up_blocks.3.resnets.2.time_emb_proj.weight", "up_blocks.3.resnets.2.time_emb_proj.bias", "up_blocks.3.resnets.2.norm2.weight", "up_blocks.3.resnets.2.norm2.bias", "up_blocks.3.resnets.2.conv2.weight", "up_blocks.3.resnets.2.conv2.bias", "up_blocks.3.resnets.2.conv_shortcut.weight", "up_blocks.3.resnets.2.conv_shortcut.bias". \
Unexpected key(s) in state_dict: "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.0.norm.bias", "up_blocks.0.attentions.0.norm.weight", "up_blocks.0.attentions.0.proj_in.bias", "up_blocks.0.attentions.0.proj_in.weight", "up_blocks.0.attentions.0.proj_out.bias", "up_blocks.0.attentions.0.proj_out.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.1.norm.bias", "up_blocks.0.attentions.1.norm.weight", "up_blocks.0.attentions.1.proj_in.bias", "up_blocks.0.attentions.1.proj_in.weight", "up_blocks.0.attentions.1.proj_out.bias", "up_blocks.0.attentions.1.proj_out.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.2.norm.bias", "up_blocks.0.attentions.2.norm.weight", "up_blocks.0.attentions.2.proj_in.bias", "up_blocks.0.attentions.2.proj_in.weight", "up_blocks.0.attentions.2.proj_out.bias", "up_blocks.0.attentions.2.proj_out.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.1.norm1.bias", "mid_block.attentions.0.transformer_blocks.1.norm1.weight", "mid_block.attentions.0.transformer_blocks.1.norm2.bias", "mid_block.attentions.0.transformer_blocks.1.norm2.weight", "mid_block.attentions.0.transformer_blocks.1.norm3.bias", "mid_block.attentions.0.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.2.norm1.bias", "mid_block.attentions.0.transformer_blocks.2.norm1.weight", "mid_block.attentions.0.transformer_blocks.2.norm2.bias", "mid_block.attentions.0.transformer_blocks.2.norm2.weight", "mid_block.attentions.0.transformer_blocks.2.norm3.bias", "mid_block.attentions.0.transformer_blocks.2.norm3.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.3.norm1.bias", "mid_block.attentions.0.transformer_blocks.3.norm1.weight", "mid_block.attentions.0.transformer_blocks.3.norm2.bias", "mid_block.attentions.0.transformer_blocks.3.norm2.weight", "mid_block.attentions.0.transformer_blocks.3.norm3.bias", "mid_block.attentions.0.transformer_blocks.3.norm3.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.4.norm1.bias", "mid_block.attentions.0.transformer_blocks.4.norm1.weight", "mid_block.attentions.0.transformer_blocks.4.norm2.bias", "mid_block.attentions.0.transformer_blocks.4.norm2.weight", "mid_block.attentions.0.transformer_blocks.4.norm3.bias", "mid_block.attentions.0.transformer_blocks.4.norm3.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.5.norm1.bias", "mid_block.attentions.0.transformer_blocks.5.norm1.weight", "mid_block.attentions.0.transformer_blocks.5.norm2.bias", "mid_block.attentions.0.transformer_blocks.5.norm2.weight", "mid_block.attentions.0.transformer_blocks.5.norm3.bias", "mid_block.attentions.0.transformer_blocks.5.norm3.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.6.norm1.bias", "mid_block.attentions.0.transformer_blocks.6.norm1.weight", "mid_block.attentions.0.transformer_blocks.6.norm2.bias", "mid_block.attentions.0.transformer_blocks.6.norm2.weight", "mid_block.attentions.0.transformer_blocks.6.norm3.bias", "mid_block.attentions.0.transformer_blocks.6.norm3.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.7.norm1.bias", "mid_block.attentions.0.transformer_blocks.7.norm1.weight", "mid_block.attentions.0.transformer_blocks.7.norm2.bias", "mid_block.attentions.0.transformer_blocks.7.norm2.weight", "mid_block.attentions.0.transformer_blocks.7.norm3.bias", "mid_block.attentions.0.transformer_blocks.7.norm3.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.8.norm1.bias", "mid_block.attentions.0.transformer_blocks.8.norm1.weight", "mid_block.attentions.0.transformer_blocks.8.norm2.bias", "mid_block.attentions.0.transformer_blocks.8.norm2.weight", "mid_block.attentions.0.transformer_blocks.8.norm3.bias", "mid_block.attentions.0.transformer_blocks.8.norm3.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.norm1.bias", "mid_block.attentions.0.transformer_blocks.9.norm1.weight", "mid_block.attentions.0.transformer_blocks.9.norm2.bias", "mid_block.attentions.0.transformer_blocks.9.norm2.weight", "mid_block.attentions.0.transformer_blocks.9.norm3.bias", "mid_block.attentions.0.transformer_blocks.9.norm3.weight". \
size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).\
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).\
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).\
size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).\
size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).\
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).\
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]).\
size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).\
size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.0.resnets.2.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).\
size mismatch for up_blocks.0.resnets.2.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).\
size mismatch for up_blocks.0.resnets.2.conv1.weight: copying a param with shape torch.Size([1280, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).\
size mismatch for up_blocks.0.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([1280, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).\
size mismatch for up_blocks.1.attentions.0.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.1.attentions.0.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.1.attentions.0.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.1.attentions.1.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.1.attentions.1.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.1.attentions.2.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for up_blocks.1.attentions.2.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.0.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).\
size mismatch for up_blocks.1.resnets.0.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]).\
size mismatch for up_blocks.1.resnets.0.conv1.weight: copying a param with shape torch.Size([640, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).\
size mismatch for up_blocks.1.resnets.0.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.0.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).\
size mismatch for up_blocks.1.resnets.0.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([640, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).\
size mismatch for up_blocks.1.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.1.norm1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]).\
size mismatch for up_blocks.1.resnets.1.norm1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]).\
size mismatch for up_blocks.1.resnets.1.conv1.weight: copying a param with shape torch.Size([640, 1280, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]).\
size mismatch for up_blocks.1.resnets.1.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.1.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.1.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.1.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).\
size mismatch for up_blocks.1.resnets.1.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([640, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]).\
size mismatch for up_blocks.1.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.2.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).\
size mismatch for up_blocks.1.resnets.2.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).\
size mismatch for up_blocks.1.resnets.2.conv1.weight: copying a param with shape torch.Size([640, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]).\
size mismatch for up_blocks.1.resnets.2.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).\
size mismatch for up_blocks.1.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.2.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.2.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.2.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).\
size mismatch for up_blocks.1.resnets.2.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([640, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]).\
size mismatch for up_blocks.1.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.1.upsamplers.0.conv.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]).\
size mismatch for up_blocks.1.upsamplers.0.conv.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.2.resnets.0.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).\
size mismatch for up_blocks.2.resnets.0.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]).\
size mismatch for up_blocks.2.resnets.0.conv1.weight: copying a param with shape torch.Size([320, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]).\
size mismatch for up_blocks.2.resnets.0.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).\
size mismatch for up_blocks.2.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.0.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.0.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.0.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).\
size mismatch for up_blocks.2.resnets.0.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([320, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]).\
size mismatch for up_blocks.2.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.1.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.2.resnets.1.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]).\
size mismatch for up_blocks.2.resnets.1.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]).\
size mismatch for up_blocks.2.resnets.1.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).\
size mismatch for up_blocks.2.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.1.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.1.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.1.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).\
size mismatch for up_blocks.2.resnets.1.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]).\
size mismatch for up_blocks.2.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.2.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]).\
size mismatch for up_blocks.2.resnets.2.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]).\
size mismatch for up_blocks.2.resnets.2.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]).\
size mismatch for up_blocks.2.resnets.2.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]).\
size mismatch for up_blocks.2.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.2.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.2.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.2.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]).\
size mismatch for up_blocks.2.resnets.2.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for up_blocks.2.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]).\
size mismatch for up_blocks.2.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]).\
size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]).\
size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).\
\
}