{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Stable Diffusion v2.1 using OpenVINO TorchDynamo backend\n", "\n", "Stable Diffusion v2 is the next generation of Stable Diffusion model a Text-to-Image latent diffusion model created by the researchers and engineers from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). \n", "\n", "General diffusion models are machine learning systems that are trained to denoise random gaussian noise step by step, to get to a sample of interest, such as an image.\n", "Diffusion models have shown to achieve state-of-the-art results for generating image data. But one downside of diffusion models is that the reverse denoising process is slow. In addition, these models consume a lot of memory because they operate in pixel space, which becomes unreasonably expensive when generating high-resolution images. Therefore, it is challenging to train these models and also use them for inference. OpenVINO brings capabilities to run model inference on Intel hardware and opens the door to the fantastic world of diffusion models for everyone!\n", "\n", "This notebook demonstrates how to run stable diffusion model using [Diffusers](https://huggingface.co/docs/diffusers/index) library and [OpenVINO `TorchDynamo` backend](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html) for Text-to-Image and Image-to-Image generation tasks.\n", "\n", "Notebook contains the following steps:\n", "\n", "1. Create pipeline with PyTorch models.\n", "2. Add OpenVINO optimization using OpenVINO TorchDynamo backend.\n", "3. Run Stable Diffusion pipeline with OpenVINO.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Table of contents:\n", "\n", "- [Prerequisites](#Prerequisites)\n", "- [Stable Diffusion with Diffusers library](#Stable-Diffusion-with-Diffusers-library)\n", "- [OpenVINO TorchDynamo backend](#OpenVINO-TorchDynamo-backend)\n", " - [Run Image generation](#Run-Image-generation)\n", "- [Interactive demo](#Interactive-demo)\n", "- [Support for Automatic1111 Stable Diffusion WebUI](#Support-for-Automatic1111-Stable-Diffusion-WebUI)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q \"torch>=2.2\" transformers diffusers \"gradio>=4.19\" ipywidgets --extra-index-url https://download.pytorch.org/whl/cpu\n", "%pip install -q \"openvino>=2024.1.0\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import gradio as gr\n", "import random\n", "import torch\n", "import time\n", "\n", "from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline\n", "import ipywidgets as widgets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stable Diffusion with Diffusers library\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "To work with Stable Diffusion v2.1, we will use Hugging Face Diffusers library. To experiment with Stable Diffusion models, Diffusers exposes the [StableDiffusionPipeline](https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation) and [StableDiffusionImg2ImgPipeline](https://huggingface.co/docs/diffusers/using-diffusers/img2img) similar to the other [Diffusers pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview). The code below demonstrates how to create the `StableDiffusionPipeline` using `stable-diffusion-2-1-base` model:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "659b2d3b0b7e47fca037f65bcf3a9bcc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading pipeline components...: 0%| | 0/6 [00:00 **Note**: Read more about available [OpenVINO backends](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html#how-to-use)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run Image generation\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1b0c3b6e259f44d885a8bd7bfe346378", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/50 [00:00" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prompt = \"a photo of an astronaut riding a horse on mars\"\n", "image = pipe(prompt).images[0]\n", "image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interactive demo\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Now you can start the demo, choose the inference mode, define prompts (and input image for Image-to-Image generation) and run inference pipeline.\n", "Optionally, you can also change some input parameters." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on local URL: http://127.0.0.1:7861\n", "\n", "To create a public link, set `share=True` in `launch()`.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Keyboard interruption in main thread... closing server.\n" ] } ], "source": [ "time_stamps = []\n", "\n", "\n", "def callback(iter, t, latents):\n", " time_stamps.append(time.time())\n", "\n", "\n", "def error_str(error, title=\"Error\"):\n", " return (\n", " f\"\"\"#### {title}\n", " {error}\"\"\"\n", " if error\n", " else \"\"\n", " )\n", "\n", "\n", "def on_mode_change(mode):\n", " return gr.update(visible=mode == modes[\"img2img\"]), gr.update(visible=mode == modes[\"txt2img\"])\n", "\n", "\n", "def inference(\n", " inf_mode,\n", " prompt,\n", " guidance=7.5,\n", " steps=25,\n", " width=768,\n", " height=768,\n", " seed=-1,\n", " img=None,\n", " strength=0.5,\n", " neg_prompt=\"\",\n", "):\n", " if seed == -1:\n", " seed = random.randint(0, 10000000)\n", " generator = torch.Generator().manual_seed(seed)\n", " res = None\n", "\n", " global time_stamps, pipe\n", " time_stamps = []\n", " try:\n", " if inf_mode == modes[\"txt2img\"]:\n", " if type(pipe).__name__ != \"StableDiffusionPipeline\":\n", " pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n", " pipe.unet = torch.compile(pipe.unet, backend=\"openvino\")\n", " res = pipe(\n", " prompt,\n", " negative_prompt=neg_prompt,\n", " num_inference_steps=int(steps),\n", " guidance_scale=guidance,\n", " width=width,\n", " height=height,\n", " generator=generator,\n", " callback=callback,\n", " callback_steps=1,\n", " ).images\n", " elif inf_mode == modes[\"img2img\"]:\n", " if img is None:\n", " return (\n", " None,\n", " None,\n", " gr.update(\n", " visible=True,\n", " value=error_str(\"Image is required for Image to Image mode\"),\n", " ),\n", " )\n", " if type(pipe).__name__ != \"StableDiffusionImg2ImgPipeline\":\n", " pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n", " pipe.unet = torch.compile(pipe.unet, backend=\"openvino\")\n", " res = pipe(\n", " prompt,\n", " negative_prompt=neg_prompt,\n", " image=img,\n", " num_inference_steps=int(steps),\n", " strength=strength,\n", " guidance_scale=guidance,\n", " generator=generator,\n", " callback=callback,\n", " callback_steps=1,\n", " ).images\n", " except Exception as e:\n", " return None, None, gr.update(visible=True, value=error_str(e))\n", "\n", " warmup_duration = time_stamps[1] - time_stamps[0]\n", " generation_rate = (steps - 1) / (time_stamps[-1] - time_stamps[1])\n", " res_info = \"Warm up time: \" + str(round(warmup_duration, 2)) + \" secs \"\n", " if generation_rate >= 1.0:\n", " res_info = res_info + \", Performance: \" + str(round(generation_rate, 2)) + \" it/s \"\n", " else:\n", " res_info = res_info + \", Performance: \" + str(round(1 / generation_rate, 2)) + \" s/it \"\n", "\n", " return (\n", " res,\n", " gr.update(visible=True, value=res_info),\n", " gr.update(visible=False, value=None),\n", " )\n", "\n", "\n", "modes = {\n", " \"txt2img\": \"Text to Image\",\n", " \"img2img\": \"Image to Image\",\n", "}\n", "\n", "with gr.Blocks(css=\"style.css\") as demo:\n", " gr.HTML(\n", " f\"\"\"\n", " Model used: {model_id} \n", " \"\"\"\n", " )\n", " with gr.Row():\n", " with gr.Column(scale=60):\n", " with gr.Group():\n", " prompt = gr.Textbox(\n", " \"a photograph of an astronaut riding a horse\",\n", " label=\"Prompt\",\n", " max_lines=2,\n", " )\n", " neg_prompt = gr.Textbox(\n", " \"frames, borderline, text, character, duplicate, error, out of frame, watermark, low quality, ugly, deformed, blur\",\n", " label=\"Negative prompt\",\n", " )\n", " res_img = gr.Gallery(label=\"Generated images\", show_label=False)\n", " error_output = gr.Markdown(visible=False)\n", "\n", " with gr.Column(scale=40):\n", " generate = gr.Button(value=\"Generate\")\n", "\n", " with gr.Group():\n", " inf_mode = gr.Dropdown(list(modes.values()), label=\"Inference Mode\", value=modes[\"txt2img\"])\n", "\n", " with gr.Column(visible=False) as i2i:\n", " image = gr.Image(label=\"Image\", height=128, type=\"pil\")\n", " strength = gr.Slider(\n", " label=\"Transformation strength\",\n", " minimum=0,\n", " maximum=1,\n", " step=0.01,\n", " value=0.5,\n", " )\n", "\n", " with gr.Group():\n", " with gr.Row() as txt2i:\n", " width = gr.Slider(label=\"Width\", value=512, minimum=64, maximum=1024, step=8)\n", " height = gr.Slider(label=\"Height\", value=512, minimum=64, maximum=1024, step=8)\n", "\n", " with gr.Group():\n", " with gr.Row():\n", " steps = gr.Slider(label=\"Steps\", value=20, minimum=1, maximum=50, step=1)\n", " guidance = gr.Slider(label=\"Guidance scale\", value=7.5, maximum=15)\n", "\n", " seed = gr.Slider(-1, 10000000, label=\"Seed (-1 = random)\", value=-1, step=1)\n", "\n", " res_info = gr.Markdown(visible=False)\n", "\n", " inf_mode.change(on_mode_change, inputs=[inf_mode], outputs=[i2i, txt2i], queue=False)\n", "\n", " inputs = [\n", " inf_mode,\n", " prompt,\n", " guidance,\n", " steps,\n", " width,\n", " height,\n", " seed,\n", " image,\n", " strength,\n", " neg_prompt,\n", " ]\n", "\n", " outputs = [res_img, res_info, error_output]\n", " prompt.submit(inference, inputs=inputs, outputs=outputs)\n", " generate.click(inference, inputs=inputs, outputs=outputs)\n", "\n", "try:\n", " demo.queue().launch(debug=True)\n", "except Exception:\n", " demo.queue().launch(share=True, debug=True)\n", "\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Support for Automatic1111 Stable Diffusion WebUI\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Automatic1111 Stable Diffusion WebUI is an open-source repository that hosts a browser-based interface for the Stable Diffusion based image generation. It allows users to create realistic and creative images from text prompts. Stable Diffusion WebUI is supported on Intel CPUs, Intel integrated GPUs, and Intel discrete GPUs by leveraging OpenVINO torch.compile capability. Detailed instructions are available in[ Stable Diffusion WebUI repository](https://github.com/openvinotoolkit/stable-diffusion-webui/wiki/Installation-on-Intel-Silicon)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/stable-diffusion-torchdynamo-backend/stable-diffusion-torchdynamo-backend.png?raw=true", "tags": { "categories": [ "Model Demos" ], "libraries": [], "other": [ "Stable Diffusion" ], "tasks": [ "Text-to-Image" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }