Furkan Gözükara

MonsterMMORPG

AI & ML interests

Check out my youtube page SECourses for Stable Diffusion tutorials. They will help you tremendously in every topic

Recent Activity

new activity about 4 hours ago
MonsterMMORPG/Generative-AI:images
reacted to their post with 🤯 about 4 hours ago
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE 1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192 https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV Official Hugging Face repo of CogVideoX1.5-5B-I2V : https://huggingface.co/THUDM/CogVideoX1.5-5B-I2V Official github repo : https://github.com/THUDM/CogVideo Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05 Demo images shared in : https://www.patreon.com/posts/112848192 I used 1360x768px images at 16 FPS and 81 frames = 5 seconds +1 frame coming from initial image Also I have enabled all the optimizations shared on Hugging Face pipe.enable_sequential_cpu_offload() pipe.vae.enable_slicing() pipe.vae.enable_tiling() quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV Used audio model : https://github.com/hkchengrex/MMAudio 1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364 https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV Used very simple prompts - it fails when there is human in input video so use text to audio in such cases I also tested some VRAM usages for CogVideoX1.5-5B-I2V Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower 512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB 576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB 768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB 896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB
reacted to their post with 🤝 about 4 hours ago
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE 1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192 https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV Official Hugging Face repo of CogVideoX1.5-5B-I2V : https://huggingface.co/THUDM/CogVideoX1.5-5B-I2V Official github repo : https://github.com/THUDM/CogVideo Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05 Demo images shared in : https://www.patreon.com/posts/112848192 I used 1360x768px images at 16 FPS and 81 frames = 5 seconds +1 frame coming from initial image Also I have enabled all the optimizations shared on Hugging Face pipe.enable_sequential_cpu_offload() pipe.vae.enable_slicing() pipe.vae.enable_tiling() quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV Used audio model : https://github.com/hkchengrex/MMAudio 1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364 https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV Used very simple prompts - it fails when there is human in input video so use text to audio in such cases I also tested some VRAM usages for CogVideoX1.5-5B-I2V Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower 512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB 576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB 768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB 896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB
View all activity

Articles

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

Posts 51

view post
Post
173
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model

Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE

1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192

https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV

Official Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2V

Official github repo : https://github.com/THUDM/CogVideo

Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05

Demo images shared in : https://www.patreon.com/posts/112848192

I used 1360x768px images at 16 FPS and 81 frames = 5 seconds

+1 frame coming from initial image

Also I have enabled all the optimizations shared on Hugging Face

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_slicing()

pipe.vae.enable_tiling()

quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV

Used audio model : https://github.com/hkchengrex/MMAudio

1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364

https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV

Used very simple prompts - it fails when there is human in input video so use text to audio in such cases

I also tested some VRAM usages for CogVideoX1.5-5B-I2V

Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower

512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB

576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB

768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB

896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB




view post
Post
2606
Simple prompt 2x latent upscaled FLUX - Fine Tuning / DreamBooth Images - Can be trained on as low as 6 GB GPUs - Each image 2048x2048 pixels

AI Photos of Yourself - Workflow Guide
Step 1: Initial Setup
Follow any standard FLUX Fine-Tuning / DreamBooth tutorial of your choice

You can also follow mine step by step : https://youtu.be/FvpWy1x5etM

Step 2: Data Collection
Gather high-quality photos of yourself

I used a Poco X6 Pro (mid-tier phone) with good results

Ensure good variety in poses and lighting

Step 3: Training
Use "ohwx man" as the only caption for all images

Keep it simple - no complex descriptions needed

Step 4: Testing & Optimization
Use SwarmUI grid to find the optimal checkpoint

Test different variations to find what works best

Step 5: Generation Settings
Upscale Parameters:

Scale: 2x

Refiner Control: 0.6

Model: RealESRGAN_x4plus.pth

Prompt Used:

photograph of ohwx man wearing an amazing ultra expensive suit on a luxury studio<segment:yolo-face_yolov9c.pt-1,0.7,0.5>photograph of ohwx man
Note: The model naturally generated smiling expressions since the training dataset included many smiling photos.

Note: yolo-face_yolov9c.pt used to mask face and auto inpaint face to improve distant shot face quality