{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Video generation with ZeroScope and OpenVINO\n", "\n", "#### Table of contents:\n", "\n", "- [Install and import required packages](#Install-and-import-required-packages)\n", "- [Load the model](#Load-the-model)\n", "- [Convert the model](#Convert-the-model)\n", " - [Define the conversion function](#Define-the-conversion-function)\n", " - [UNet](#UNet)\n", " - [VAE](#VAE)\n", " - [Text encoder](#Text-encoder)\n", "- [Build a pipeline](#Build-a-pipeline)\n", "- [Inference with OpenVINO](#Inference-with-OpenVINO)\n", " - [Select inference device](#Select-inference-device)\n", " - [Define a prompt](#Define-a-prompt)\n", " - [Video generation](#Video-generation)\n", "- [Interactive demo](#Interactive-demo)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The ZeroScope model is a free and open-source text-to-video model that can generate realistic and engaging videos from text descriptions. It is based on the [Modelscope](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) model, but it has been improved to produce higher-quality videos with a 16:9 aspect ratio and no Shutterstock watermark. The ZeroScope model is available in two versions: ZeroScope_v2 576w, which is optimized for rapid content creation at a resolution of 576x320 pixels, and ZeroScope_v2 XL, which upscales videos to a high-definition resolution of 1024x576.\n", "\n", "The ZeroScope model is trained on a dataset of over 9,000 videos and 29,000 tagged frames. It uses a diffusion model to generate videos, which means that it starts with a random noise image and gradually adds detail to it until it matches the text description. The ZeroScope model is still under development, but it has already been used to create some impressive videos. For example, it has been used to create videos of people dancing, playing sports, and even driving cars.\n", "\n", "The ZeroScope model is a powerful tool that can be used to create various videos, from simple animations to complex scenes. It is still under development, but it has the potential to revolutionize the way we create and consume video content.\n", "\n", "Both versions of the ZeroScope model are available on Hugging Face:\n", " - [ZeroScope_v2 576w](https://huggingface.co/cerspense/zeroscope_v2_576w)\n", " - [ZeroScope_v2 XL](https://huggingface.co/cerspense/zeroscope_v2_XL)\n", "\n", "We will use the first one." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "