If you're looking for a great way to optimize video creation and editing, especially for low VRAM machines, the CogVideoX1.5-5B-I2V model is a solid option. It's capable of generating videos with high resolution and up to 10 seconds of 161 frames, making it perfect for creative projects. If you're into android game development and need efficient tools for generating videos or animations, this model could be very useful. For more information, you can check out the official site here https://huggingface.co/datasets/CohereForAI/Global-MMLU/discussions/2
nick
nick76876
AI & ML interests
None yet
Recent Activity
replied to
MonsterMMORPG's
post
1 day ago
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model
Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE
1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192
https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV
Official Hugging Face repo of CogVideoX1.5-5B-I2V : https://huggingface.co/THUDM/CogVideoX1.5-5B-I2V
Official github repo : https://github.com/THUDM/CogVideo
Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05
Demo images shared in : https://www.patreon.com/posts/112848192
I used 1360x768px images at 16 FPS and 81 frames = 5 seconds
+1 frame coming from initial image
Also I have enabled all the optimizations shared on Hugging Face
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV
Used audio model : https://github.com/hkchengrex/MMAudio
1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364
https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV
Used very simple prompts - it fails when there is human in input video so use text to audio in such cases
I also tested some VRAM usages for CogVideoX1.5-5B-I2V
Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower
512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB
576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB
768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB
896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB
Organizations
None yet