Diffusers?
Wow, does the current Diffusers support the SkyReels checkpoint? And in int4?
If it does, maybe the code on the readme could be updated accordingly?
No it's not supported in diffusers but I'm trying.
Oh, I hope I wasn't the reason for the deleted checkpoints.
Only duplicate files deleted to save space.
Sky T2V and I2V have int4-quantized transformer.
Hunyuan have int4-quantized text-encoder and rest untouched.
So
load transformer from Sky
rest from hunyuan-int4
It is working fine (Diffuser still doesn't support it) but the memory consumption is very high so basically unusable on low VRAM.
Model is trained on specific resolution / frames only :(
960px544px97f
Support in diffusers is added
https://github.com/huggingface/diffusers/pull/10837
You need to install diffusers from source and can use the int4 checkpoints from here
Yeah, I tried the Diffusers implementation, but it eats up some 34 GB VRAM on my RTX 4090, and then nothing happens.
I'tried a lot of stuff to use your int4, but I'm basically not understading what I'm doing. Could you help me out with a snippet, which show how this is supposed to be used with Diffusers?
I actually used the original code and modified for my need.
Give me some time, will create code using diffusers and post here
https://github.com/huggingface/diffusers/discussions
I tried to mix and merge my Hunyuan loading code with your transformer however I got this super weird error:
python\Lib\site-packages\diffusers\configuration_utils.py", line 414, in load_config
raise EnvironmentError(
OSError: stable-diffusion-v1-5/stable-diffusion-v1-5 does not appear to have a file named config.json.
My code:
#import torch._dynamo.config
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
from diffusers import BitsAndBytesConfig
from transformers import LlamaModel, CLIPTextModel
#from diffusers import GGUFQuantizationConfig
#torch._dynamo.config.inline_inbuilt_nn_modules = True
model_id = "hunyuanvideo-community/HunyuanVideo"
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)
text_encoder = LlamaModel.from_pretrained(
model_id,
subfolder="text_encoder",
quantization_config=quantization_config,
torch_dtype=torch.float16
)
text_encoder_2 = CLIPTextModel.from_pretrained(
model_id,
subfolder="text_encoder_2",
quantization_config=quantization_config,
torch_dtype=torch.float16
)
transformer_path = f"https://huggingface.co/newgenai79/SkyReels-V1-Hunyuan-T2V-int4/blob/main/transformer/diffusion_pytorch_model.safetensors"
transformer = HunyuanVideoTransformer3DModel.from_single_file(
transformer_path,
torch_dtype=torch.bfloat16,
)
pipe = HunyuanVideoPipeline.from_pretrained(
model_id,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
transformer=transformer,
torch_dtype=torch.float16,
)
pipe.vae.enable_tiling()
pipe.to("cuda")
output = pipe(
prompt="A cat walks on the grass, realistic",
negative_prompt="Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion",
height=544,
width=960,
num_frames=97,
guidance_scale=1.0,
true_cfg_scale=6.0,
num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)```
Saw your code at Diffusers, thank you!
The main difference is that the dynamo function isn't in torch 2.3.1+cu121, and I need that version or cuda will not be found here.
Install 2.5.1+cu124. This is better than earlier versions as it save VRAM. I read it somewhere.
Also install CUDA 12.6
I gave it a shot, but Diffusers fail if I go above: 2.3.1+cu121, and ex. installing diffusers with [torch] will install 2.6.0, but then CUDA can't be found. Super annoying. The trouble is I'm also trying to keep this big whale of messy dependencies alive: https://github.com/tin2tin/Pallaidium
https://github.com/pytorch/pytorch/issues/130840#issuecomment-2238589486
2.4.0+cu121 both supports dynamo and finds cuda on my computer, so now it seems to run your int4 transformer. So, maybe you could include your code on the model card? Does the dynamo-trick also work for i2v?
So as we now know dynamo is not required. :)
I have seen you announced your project (Pallaidium) on reddit. Keep up the good work and you can use these models as you wish.
So as we now know dynamo is not required. :)
Yeah. I have a really hard time to get it to generate videos which doesn't desolve a few frames in. Are you able to produce working shots most of the time? If you do, what are your settings?
I have seen you announced your project (Pallaidium) on reddit. Keep up the good work and you can use these models as you wish.
I think only one or two are using Pallaidium, so it's not going to overturn ComfyUI any day soon. :-) Reddit is only ComfyUI users, so the most likely outcome of posting there is some would want to try to use your transform on Comfy. Thank you!
I also didn't got any good output. However if I see their github several users posted good output but all of them were using cloud. Any sort of quantization reduces quality. :(
Don't worry it takes time for users to adopt to something new. Keep users informed about new features that you add on different channels. Good luck.