newgenai79/SkyReels-V1-Hunyuan-T2V-int4

tintwotin

8 days ago

Wow, does the current Diffusers support the SkyReels checkpoint? And in int4?

If it does, maybe the code on the readme could be updated accordingly?

fullsoftwares

8 days ago

This comment has been hidden

newgenai79

Owner 8 days ago

No it's not supported in diffusers but I'm trying.

newgenai79 changed discussion status to closed 8 days ago

tintwotin

8 days ago

Oh, I hope I wasn't the reason for the deleted checkpoints.

newgenai79

Owner 8 days ago

•

edited 8 days ago

Only duplicate files deleted to save space.
Sky T2V and I2V have int4-quantized transformer.
Hunyuan have int4-quantized text-encoder and rest untouched.

So
load transformer from Sky
rest from hunyuan-int4

It is working fine (Diffuser still doesn't support it) but the memory consumption is very high so basically unusable on low VRAM.
Model is trained on specific resolution / frames only :(
960px544px97f

newgenai79

Owner 7 days ago

•

edited 7 days ago

@tintwotin

Support in diffusers is added
https://github.com/huggingface/diffusers/pull/10837

You need to install diffusers from source and can use the int4 checkpoints from here

tintwotin

6 days ago

Yeah, I tried the Diffusers implementation, but it eats up some 34 GB VRAM on my RTX 4090, and then nothing happens.

I'tried a lot of stuff to use your int4, but I'm basically not understading what I'm doing. Could you help me out with a snippet, which show how this is supposed to be used with Diffusers?

fullsoftwares

6 days ago

•

edited 6 days ago

I actually used the original code and modified for my need.
Give me some time, will create code using diffusers and post here
https://github.com/huggingface/diffusers/discussions

tintwotin

6 days ago

•

edited 6 days ago

I tried to mix and merge my Hunyuan loading code with your transformer however I got this super weird error:

python\Lib\site-packages\diffusers\configuration_utils.py", line 414, in load_config
raise EnvironmentError(
OSError: stable-diffusion-v1-5/stable-diffusion-v1-5 does not appear to have a file named config.json.

My code:

#import torch._dynamo.config
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

from diffusers import BitsAndBytesConfig
from transformers import LlamaModel, CLIPTextModel
#from diffusers import GGUFQuantizationConfig
#torch._dynamo.config.inline_inbuilt_nn_modules = True

model_id = "hunyuanvideo-community/HunyuanVideo"

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)
text_encoder = LlamaModel.from_pretrained(
    model_id, 
    subfolder="text_encoder", 
    quantization_config=quantization_config,
    torch_dtype=torch.float16
)
text_encoder_2 = CLIPTextModel.from_pretrained(
    model_id, 
    subfolder="text_encoder_2", 
    quantization_config=quantization_config,
    torch_dtype=torch.float16
)

transformer_path = f"https://huggingface.co/newgenai79/SkyReels-V1-Hunyuan-T2V-int4/blob/main/transformer/diffusion_pytorch_model.safetensors"

transformer = HunyuanVideoTransformer3DModel.from_single_file(
    transformer_path,
    torch_dtype=torch.bfloat16,
)
pipe = HunyuanVideoPipeline.from_pretrained(
    model_id, 
    text_encoder=text_encoder,
    text_encoder_2=text_encoder_2,
    transformer=transformer, 
    torch_dtype=torch.float16, 
)

pipe.vae.enable_tiling()
pipe.to("cuda")

output = pipe(
    prompt="A cat walks on the grass, realistic",
    negative_prompt="Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion",
    height=544,
    width=960,
    num_frames=97,
    guidance_scale=1.0,
    true_cfg_scale=6.0,
    num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)```

tintwotin

6 days ago

Saw your code at Diffusers, thank you!

The main difference is that the dynamo function isn't in torch 2.3.1+cu121, and I need that version or cuda will not be found here.

fullsoftwares

6 days ago

Install 2.5.1+cu124. This is better than earlier versions as it save VRAM. I read it somewhere.
Also install CUDA 12.6

tintwotin

5 days ago

I gave it a shot, but Diffusers fail if I go above: 2.3.1+cu121, and ex. installing diffusers with [torch] will install 2.6.0, but then CUDA can't be found. Super annoying. The trouble is I'm also trying to keep this big whale of messy dependencies alive: https://github.com/tin2tin/Pallaidium

https://github.com/pytorch/pytorch/issues/130840#issuecomment-2238589486

tintwotin

5 days ago

2.4.0+cu121 both supports dynamo and finds cuda on my computer, so now it seems to run your int4 transformer. So, maybe you could include your code on the model card? Does the dynamo-trick also work for i2v?

fullsoftwares

5 days ago

So as we now know dynamo is not required. :)
I have seen you announced your project (Pallaidium) on reddit. Keep up the good work and you can use these models as you wish.

tintwotin

4 days ago

•

edited 4 days ago

So as we now know dynamo is not required. :)

Yeah. I have a really hard time to get it to generate videos which doesn't desolve a few frames in. Are you able to produce working shots most of the time? If you do, what are your settings?

I have seen you announced your project (Pallaidium) on reddit. Keep up the good work and you can use these models as you wish.

I think only one or two are using Pallaidium, so it's not going to overturn ComfyUI any day soon. :-) Reddit is only ComfyUI users, so the most likely outcome of posting there is some would want to try to use your transform on Comfy. Thank you!

fullsoftwares

4 days ago

•

edited 3 days ago

I also didn't got any good output. However if I see their github several users posted good output but all of them were using cloud. Any sort of quantization reduces quality. :(

Don't worry it takes time for users to adopt to something new. Keep users informed about new features that you add on different channels. Good luck.