Diffusers
Safetensors
English

About finetuning current SDXL weights by the EQ-SDXL-VAE

#2
by eeyrw - opened

You said in intro: "You can try to use this VAE to finetune your sdxl model and expect a better final result, but it may require lot of time to achieve it...". I am still very interesting in utilizing existed model weights. So my question is lot of how? I have ~500k samples and how many iterations are required to align the UNet of SDXL with new latent space?

lot of training time.
ALTHOUGH some reported result is "few k step with a small lora works well"
Your setup is definitely ok

I just thought dataset like LAION-400M needed. Finally it turns out in scale of kilo samples said to be working.

I just thought dataset like LAION-400M needed. Finally it turns out in scale of kilo samples said to be working.

My thought is like danbooru (8M) or CC 12M
and yes, I'm also surprising that few k or just few dozen k is enough

I spent a night to have a quick try by finetuning a lora about 48k iterations and get very poor result and I suspect that there is something wrong in finetuning process. Do I need modify my training script in aspect of VAE? Because I notice there are some parameters not used by oringinal VAE such as:

            "shift_factor": 0.8640247167934477,

In my training script VAE encoding part goes like this:

            model_input = vae.encode(pixel_values).latent_dist.sample()
            model_input = model_input * vae.config.scaling_factor 
            model_input = model_input.to(weight_dtype)

should I change the code into:

            model_input = model_input * vae.config.scaling_factor + vae.config.shift_factor

?
By the way, I use StableDiffusionXLPipeline from diffusers for inference.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment