Support for variant fp16?

#11
by handsomesteve - opened

Hey guys is there a VAE for this model that can support the variant="fp16" option when loading? I think it could potentially reduce load and inference time by a fair amount.

Cagliostro Research Lab org

Hey guys is there a VAE for this model that can support the variant="fp16" option when loading? I think it could potentially reduce load and inference time by a fair amount.

for VAE already baked in model...
and VAE Animagine XL 3.1 use already fp16
we use https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

for VAE already baked in model...
and VAE Animagine XL 3.1 use already fp16
we use https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

I see, is it possible to add the ".fp16" into the file names? Currently it isn't possible to load from_pretrained() with the variant="fp16" in diffusers, as the file names don't exist.

variant (`str`, *optional*):
                Load weights from a specified variant filename such as `"fp16"` or `"ema"`.

So trying to load it with variant="fp16" will throw the error You are trying to load the model files of the variant=fp16, but no such modeling files are available. The default model files: ... will be loaded instead

We're running a remote server that fetches the model from this huggingface repo. I did try changing the names to add a "fp16" and uploading everything to my own repo. While I could load the models with this method, I always end up with the error

encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference
OverflowError: cannot fit 'int' into an index-sized integer

I guess I am looking for advice on how to load the Animagine XL 3.1 on DiffusionPipeline.from_pretrained() with the variant="fp16"enabled. Because by default it will load as float32 and then convert to float16. Loading with variant="fp16" will load as float16 and save some time.
Much appreciated!

handsomesteve changed discussion title from Support for fp16 vae? to Support for variant fp16?
Cagliostro Research Lab org

for VAE already baked in model...
and VAE Animagine XL 3.1 use already fp16
we use https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

I see, is it possible to add the ".fp16" into the file names? Currently it isn't possible to load from_pretrained() with the variant="fp16" in diffusers, as the file names don't exist.

variant (`str`, *optional*):
                Load weights from a specified variant filename such as `"fp16"` or `"ema"`.

So trying to load it with variant="fp16" will throw the error You are trying to load the model files of the variant=fp16, but no such modeling files are available. The default model files: ... will be loaded instead

We're running a remote server that fetches the model from this huggingface repo. I did try changing the names to add a "fp16" and uploading everything to my own repo. While I could load the models with this method, I always end up with the error

encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference
OverflowError: cannot fit 'int' into an index-sized integer

I guess I am looking for advice on how to load the Animagine XL 3.1 on DiffusionPipeline.from_pretrained() with the variant="fp16"enabled. Because by default it will load as float32 and then convert to float16. Loading with variant="fp16" will load as float16 and save some time.

Model AnimagineXL 3.1 is already fp16
you dont need to use variant="fp16" (because use variant will read extension of filename like sdxl_vae.fp16.safetensor)
if you still want to make sure try torch_dtype = torch.float16 (you can check implementation at https://huggingface.co/cagliostrolab/animagine-xl-3.1#%F0%9F%A7%A8-diffusers-installation)

I did try torch_dtype = torch.float16. I believe the model loads it as float32 and then converts to float16 afterwards for inference. This is slower than loading directly as float16.

There is a noticeable faster model load speed when variant="fp16" is used (like in the sdxl base example https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#%F0%9F%A7%A8-diffusers), compared to not having it on.

Cagliostro Research Lab org
edited Apr 2, 2024

I did try torch_dtype = torch.float16. I believe the model loads it as float32 and then converts to float16 afterwards for inference. This is slower than loading directly as float16.

There is a noticeable faster model load speed when variant="fp16" is used (like in the sdxl base example https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#%F0%9F%A7%A8-diffusers), compared to not having it on.

why you still believe model AnimagineXL load as float32 ?

you can see at my image, does diffuser load model float32 from nothing ?
image.png

at sdxl base have 2 variant you can see at precision every model below
model.fp16.safetensors

image.png

model.safetensors

image.png

maybe if in SDXL base you need to add variant="fp16" because there are 2 variants fp32 and fp16
but in this AnimagineXL 3.1 model it only has 1 variant, that is fp16, and the file name is model.safetensors if you equate it with model.safetensors in sdxl base, the difference in precision is clear

that's why you believe that AnimagineXL 3.1 is loaded fp32 first but in fact animagine model only has 1 precision which is fp16 in this repo

Yea I agree with you that AnimagineXL 3.1 is in fp16. If it was fp32 then it would be over 10GB.

My problem is the diffusers library is dumb as hell and can't load variant="fp16" without the "fp16" being in the name. And if I don't specify variant="fp16" then diffusers will still try to load it as fp32 even though it is fp16 (no error, just inefficient and slow). I'll try cloning this model repo, renaming everything to "fp16", and try loading in with the parameter variant="fp16".

It works, I'm seeing a reduction of model load time of 50% on my setup :)

handsomesteve changed discussion status to closed

Sign up or log in to comment