pipeline_tag: text-to-video | |
![model example](https://i.imgur.com/ze1DGOJ.png) | |
[example outputs](https://www.youtube.com/watch?v=HO3APT_0UA4) (courtesy of [dotsimulate](https://www.instagram.com/dotsimulate/)) | |
# zeroscope_v2 XL | |
A watermark-free Modelscope-based video model capable of generating high quality video at 1024 x 576. This model was trained with offset noise using 9,923 clips and 29,769 tagged frames at 24 frames, 1024x576 resolution.<br /> | |
zeroscope_v2_XL is specifically designed for upscaling content made with [zeroscope_v2_576w](https://huggingface.co/cerspense/zeroscope_v2_567w) using vid2vid in the [1111 text2video](https://github.com/kabachuha/sd-webui-text2video) extension by [kabachuha](https://github.com/kabachuha). Leveraging this model as an upscaler allows for superior overall compositions at higher resolutions, permitting faster exploration in 576x320 (or 448x256) before transitioning to a high-resolution render.<br /> | |
zeroscope_v2_XL uses 15.3gb of vram when rendering 30 frames at 1024x576 | |
### Using it with the 1111 text2video extension | |
1. Download files in the zs2_XL folder. | |
2. Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory. | |
### Upscaling recommendations | |
For upscaling, it's recommended to use the 1111 extension. It works best at 1024x576 with a denoise strength between 0.66 and 0.85. Remember to use the same prompt that was used to generate the original clip. | |
### Known issues | |
Rendering at lower resolutions or fewer than 24 frames could lead to suboptimal outputs. <br /> | |
Thanks to [camenduru](https://github.com/camenduru), [kabachuha](https://github.com/kabachuha), [ExponentialML](https://github.com/ExponentialML), [dotsimulate](https://www.instagram.com/dotsimulate/), [VANYA](https://twitter.com/veryVANYA), [polyware](https://twitter.com/polyware_ai), [tin2tin](https://github.com/tin2tin)<br /> |