{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "00af7d21-9b28-4cc4-8103-bb46ba1264f0", "metadata": {}, "source": [ "# Image generation with Stable Diffusion XL and OpenVINO\n", "\n", "Stable Diffusion XL or SDXL is the latest image generation model that is tailored towards more photorealistic outputs with more detailed imagery and composition compared to previous Stable Diffusion models, including Stable Diffusion 2.1.\n", "\n", "With Stable Diffusion XL you can now make more realistic images with improved face generation, produce legible text within images, and create more aesthetically pleasing art using shorter prompts.\n", "\n", "![pipeline](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/pipeline.png)\n", "\n", "[SDXL](https://arxiv.org/abs/2307.01952) consists of an [ensemble of experts](https://arxiv.org/abs/2211.01324) pipeline for latent diffusion: In the first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model specialized for the final denoising steps. Note that the base model can be used as a standalone module or in a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called [SDEdit](https://arxiv.org/abs/2108.01073)( also known as \"image to image\") to the latents generated in the first step, using the same prompt. \n", "\n", "Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. The authors design multiple novel conditioning schemes and train SDXL on multiple aspect ratios and also introduce a refinement model that is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. The testing of SDXL shows drastically improved performance compared to the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.\n", "\n", "In this tutorial, we consider how to run the SDXL model using OpenVINO.\n", "\n", "We will use a pre-trained model from the [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index) library. To simplify the user experience, the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library is used to convert the models to OpenVINO™ IR format.\n", "\n", "The tutorial consists of the following steps:\n", "\n", "- Install prerequisites\n", "- Download the Stable Diffusion XL Base model from a public source using the [OpenVINO integration with Hugging Face Optimum](https://huggingface.co/blog/openvino).\n", "- Run Text2Image generation pipeline using Stable Diffusion XL base\n", "- Run Image2Image generation pipeline using Stable Diffusion XL base\n", "- Download and convert the Stable Diffusion XL Refiner model from a public source using the [OpenVINO integration with Hugging Face Optimum](https://huggingface.co/blog/openvino).\n", "- Run 2-stages Stable Diffusion XL pipeline\n", "\n", ">**Note**: Some demonstrated models can require at least 64GB RAM for conversion and running." ] }, { "attachments": {}, "cell_type": "markdown", "id": "786314ec-65e4-4251-8c5a-c62efb2a5769", "metadata": {}, "source": [ "\n", "#### Table of contents:\n", "\n", "- [Install prerequisites](#Install-prerequisites)\n", "- [SDXL Base model](#SDXL-Base-model)\n", " - [Select inference device SDXL Base model](#Select-inference-device-SDXL-Base-model)\n", " - [Run Text2Image generation pipeline](#Run-Text2Image-generation-pipeline)\n", " - [Text2image Generation Interactive Demo](#Text2image-Generation-Interactive-Demo)\n", " - [Run Image2Image generation pipeline](#Run-Image2Image-generation-pipeline)\n", " - [Select inference device SDXL Refiner model](#Select-inference-device-SDXL-Refiner-model)\n", " - [Image2Image Generation Interactive Demo](#Image2Image-Generation-Interactive-Demo)\n", "- [SDXL Refiner model](#SDXL-Refiner-model)\n", " - [Select inference device](#Select-inference-device)\n", " - [Run Text2Image generation with Refinement](#Run-Text2Image-generation-with-Refinement)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ee62ee05-0388-4b6f-8565-5b8b57f72a09", "metadata": {}, "source": [ "## Install prerequisites\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2ecf3e6d-cbc1-4b57-be08-2ded40f182ce", "metadata": { "tags": [] }, "outputs": [], "source": [ "%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu \"torch>=2.1\" \"diffusers>=0.18.0\" \"invisible-watermark>=0.2.0\" \"transformers>=4.33.0\" \"accelerate\" \"onnx\" \"peft==0.6.2\"\n", "%pip install -q \"git+https://github.com/huggingface/optimum-intel.git\"\n", "%pip install -q \"openvino>=2023.1.0\" \"gradio>=4.19\" \"nncf>=2.9.0\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ed9dfe55-8ae7-4b31-a102-b53b1d2d4941", "metadata": {}, "source": [ "## SDXL Base model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "We will start with the base model part, which is responsible for the generation of images of the desired output size. \n", "[stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is available for downloading via the [HuggingFace hub](https://huggingface.co/models). It already provides a ready-to-use model in OpenVINO format compatible with [Optimum Intel](https://huggingface.co/docs/optimum/intel/index).\n", "\n", "To load an OpenVINO model and run an inference with OpenVINO Runtime, you need to replace diffusers `StableDiffusionXLPipeline` with Optimum `OVStableDiffusionXLPipeline`. In case you want to load a PyTorch model and convert it to the OpenVINO format on the fly, you can set `export=True`. \n", "\n", "You can save the model on disk using the `save_pretrained` method." ] }, { "cell_type": "code", "execution_count": null, "id": "e16d2760-85bd-4a5f-be1b-a7313d960c56", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from optimum.intel.openvino import OVStableDiffusionXLPipeline\n", "import gc\n", "\n", "model_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\n", "model_dir = Path(\"openvino-sd-xl-base-1.0\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "867f589e-919c-455a-8b60-6c7fc5565ebf", "metadata": {}, "source": [ "### Select inference device SDXL Base model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 3, "id": "6350dca3-65d4-46ac-ae71-9692ac578899", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "078aaf84b3c34bae857c58a6aaea6244", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets as widgets\n", "import openvino as ov\n", "\n", "core = ov.Core()\n", "\n", "device = widgets.Dropdown(\n", " options=core.available_devices + [\"AUTO\"],\n", " value=\"AUTO\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "cell_type": "markdown", "id": "318de1b2", "metadata": {}, "source": [ "Please select below whether you would like to use weight compression to reduce memory footprint. [Optimum Intel](https://huggingface.co/docs/optimum/en/intel/optimization_ov#weight-only-quantization) supports weight compression via NNCF out of the box. For 8-bit compression we provide `quantization_config=OVWeightQuantizationConfig(bits=8, ...)` argument to `from_pretrained()` method containing number of bits and other compression parameters." ] }, { "cell_type": "code", "execution_count": 22, "id": "6c6cbc44", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ee1b1540dced43f583af17e2ec584a90", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Apply weight compression')" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compress_weights = widgets.Checkbox(\n", " description=\"Apply weight compression\",\n", " value=True,\n", ")\n", "\n", "compress_weights" ] }, { "cell_type": "code", "execution_count": 24, "id": "534feee4", "metadata": {}, "outputs": [], "source": [ "def get_quantization_config(compress_weights):\n", " quantization_config = None\n", " if compress_weights.value:\n", " from optimum.intel import OVWeightQuantizationConfig\n", "\n", " quantization_config = OVWeightQuantizationConfig(bits=8)\n", " return quantization_config\n", "\n", "\n", "quantization_config = get_quantization_config(compress_weights)" ] }, { "cell_type": "code", "execution_count": 26, "id": "a4e9bd80-88e7-4f97-a5b3-6274f91a7165", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:Statistics of the bitwidth distribution:\n", "+--------------+---------------------------+-----------------------------------+\n", "| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |\n", "| | | (layers) |\n", "+==============+===========================+===================================+\n", "| 8 | 100% (794 / 794) | 100% (794 / 794) |\n", "+--------------+---------------------------+-----------------------------------+\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cc757b6789764ee3acf9e7596dc31acc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:Statistics of the bitwidth distribution:\n", "+--------------+---------------------------+-----------------------------------+\n", "| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |\n", "| | | (layers) |\n", "+==============+===========================+===================================+\n", "| 8 | 100% (32 / 32) | 100% (32 / 32) |\n", "+--------------+---------------------------+-----------------------------------+\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "47050c8337c042ef88d9e699f83c038d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:Statistics of the bitwidth distribution:\n", "+--------------+---------------------------+-----------------------------------+\n", "| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |\n", "| | | (layers) |\n", "+==============+===========================+===================================+\n", "| 8 | 100% (40 / 40) | 100% (40 / 40) |\n", "+--------------+---------------------------+-----------------------------------+\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3b39f6a0573d48ef8ebef899dd6e176d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:Statistics of the bitwidth distribution:\n", "+--------------+---------------------------+-----------------------------------+\n", "| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |\n", "| | | (layers) |\n", "+==============+===========================+===================================+\n", "| 8 | 100% (74 / 74) | 100% (74 / 74) |\n", "+--------------+---------------------------+-----------------------------------+\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "59e52e4c1d1849e8af882e53fc9a278b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:Statistics of the bitwidth distribution:\n", "+--------------+---------------------------+-----------------------------------+\n", "| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |\n", "| | | (layers) |\n", "+==============+===========================+===================================+\n", "| 8 | 100% (195 / 195) | 100% (195 / 195) |\n", "+--------------+---------------------------+-----------------------------------+\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ffb7e1e954494539af454dec1dd9a2cc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n" ] } ], "source": [ "if not model_dir.exists():\n", " text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_id, compile=False, device=device.value, quantization_config=quantization_config)\n", " text2image_pipe.half()\n", " text2image_pipe.save_pretrained(model_dir)\n", " text2image_pipe.compile()\n", "else:\n", " text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3417085c-e1da-40b7-bff9-acbfd17b3c02", "metadata": {}, "source": [ "### Run Text2Image generation pipeline\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Now, we can run the model for the generation of images using text prompts. To speed up evaluation and reduce the required memory we decrease `num_inference_steps` and image size (using `height` and `width`). You can modify them to suit your needs and depend on the target hardware. We also specified a `generator` parameter based on a numpy random state with a specific seed for results reproducibility." ] }, { "cell_type": "code", "execution_count": 27, "id": "cf168ab0-8bba-4bb6-8da5-0937b5762ef8", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0bd701da84414b7cb6ab8d529d12b293", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/15 [00:00" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "prompt = \"cute cat 4k, high-res, masterpiece, best quality, soft lighting, dynamic angle\"\n", "image = text2image_pipe(\n", " prompt,\n", " num_inference_steps=15,\n", " height=512,\n", " width=512,\n", " generator=np.random.RandomState(314),\n", ").images[0]\n", "image.save(\"cat.png\")\n", "image" ] }, { "attachments": {}, "cell_type": "markdown", "id": "399ebaaa-74ad-4ef2-a197-bbedb143d1ec", "metadata": {}, "source": [ "### Text2image Generation Interactive Demo\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "0cea2149-f327-4dc4-9083-29ee343f2045", "metadata": {}, "outputs": [], "source": [ "import gradio as gr\n", "\n", "if text2image_pipe is None:\n", " text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", "\n", "prompt = \"cute cat 4k, high-res, masterpiece, best quality, soft lighting, dynamic angle\"\n", "\n", "\n", "def generate_from_text(text, seed, num_steps):\n", " result = text2image_pipe(\n", " text,\n", " num_inference_steps=num_steps,\n", " generator=np.random.RandomState(seed),\n", " height=512,\n", " width=512,\n", " ).images[0]\n", " return result\n", "\n", "\n", "with gr.Blocks() as demo:\n", " with gr.Column():\n", " positive_input = gr.Textbox(label=\"Text prompt\")\n", " with gr.Row():\n", " seed_input = gr.Number(precision=0, label=\"Seed\", value=42, minimum=0)\n", " steps_input = gr.Slider(label=\"Steps\", value=10)\n", " btn = gr.Button()\n", " out = gr.Image(label=\"Result\", type=\"pil\", width=512)\n", " btn.click(generate_from_text, [positive_input, seed_input, steps_input], out)\n", " gr.Examples(\n", " [\n", " [prompt, 999, 20],\n", " [\n", " \"underwater world coral reef, colorful jellyfish, 35mm, cinematic lighting, shallow depth of field, ultra quality, masterpiece, realistic\",\n", " 89,\n", " 20,\n", " ],\n", " [\n", " \"a photo realistic happy white poodle dog ​​playing in the grass, extremely detailed, high res, 8k, masterpiece, dynamic angle\",\n", " 1569,\n", " 15,\n", " ],\n", " [\n", " \"Astronaut on Mars watching sunset, best quality, cinematic effects,\",\n", " 65245,\n", " 12,\n", " ],\n", " [\n", " \"Black and white street photography of a rainy night in New York, reflections on wet pavement\",\n", " 48199,\n", " 10,\n", " ],\n", " ],\n", " [positive_input, seed_input, steps_input],\n", " )\n", "\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/\n", "# if you want create public link for sharing demo, please add share=True\n", "demo.launch()" ] }, { "cell_type": "code", "execution_count": null, "id": "fce224cb-0ccb-4aeb-b3c4-346dd7036015", "metadata": {}, "outputs": [], "source": [ "demo.close()\n", "text2image_pipe = None\n", "gc.collect();" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0e9a929d-694e-44a9-9f35-e1beca449ad7", "metadata": {}, "source": [ "### Run Image2Image generation pipeline\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "We can reuse the already converted model for running the Image2Image generation pipeline. For that, we should replace `OVStableDiffusionXLPipeline` with `OVStableDiffusionXLImage2ImagePipeline`." ] }, { "attachments": {}, "cell_type": "markdown", "id": "3993c958-b7ea-47f1-ad10-9d883e9c1860", "metadata": {}, "source": [ "#### Select inference device SDXL Refiner model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 8, "id": "27666906-1318-4e7a-afe5-85144a170c9b", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "078aaf84b3c34bae857c58a6aaea6244", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "device" ] }, { "cell_type": "code", "execution_count": 9, "id": "35926f53-ffe8-4386-beac-f5ab4e78130a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n" ] } ], "source": [ "from optimum.intel import OVStableDiffusionXLImg2ImgPipeline\n", "\n", "image2image_pipe = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir, device=device.value)" ] }, { "cell_type": "code", "execution_count": 10, "id": "48892114-de29-4289-8c0c-1199f912ee01", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c0009ea460d04610aa75bbafafd07963", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/7 [00:00" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "photo_prompt = \"professional photo of a cat, extremely detailed, hyper realistic, best quality, full hd\"\n", "photo_image = image2image_pipe(\n", " photo_prompt,\n", " image=image,\n", " num_inference_steps=25,\n", " generator=np.random.RandomState(356),\n", ").images[0]\n", "photo_image.save(\"photo_cat.png\")\n", "photo_image" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d163ee59-1228-4f2d-b78f-925a41fffcb8", "metadata": {}, "source": [ "### Image2Image Generation Interactive Demo\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "875fc6b1-88a9-4203-9df4-168a70fb89dc", "metadata": {}, "outputs": [], "source": [ "import gradio as gr\n", "from diffusers.utils import load_image\n", "import numpy as np\n", "\n", "\n", "load_image(\"https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png\").resize((512, 512)).save(\n", " \"castle_friedrich.png\"\n", ")\n", "\n", "\n", "if image2image_pipe is None:\n", " image2image_pipe = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir)\n", "\n", "\n", "def generate_from_image(text, image, seed, num_steps):\n", " result = image2image_pipe(\n", " text,\n", " image=image,\n", " num_inference_steps=num_steps,\n", " generator=np.random.RandomState(seed),\n", " ).images[0]\n", " return result\n", "\n", "\n", "with gr.Blocks() as demo:\n", " with gr.Column():\n", " positive_input = gr.Textbox(label=\"Text prompt\")\n", " with gr.Row():\n", " seed_input = gr.Number(precision=0, label=\"Seed\", value=42, minimum=0)\n", " steps_input = gr.Slider(label=\"Steps\", value=10)\n", " btn = gr.Button()\n", " with gr.Row():\n", " i2i_input = gr.Image(label=\"Input image\", type=\"pil\")\n", " out = gr.Image(label=\"Result\", type=\"pil\", width=512)\n", " btn.click(\n", " generate_from_image,\n", " [positive_input, i2i_input, seed_input, steps_input],\n", " out,\n", " )\n", " gr.Examples(\n", " [\n", " [\"amazing landscape from legends\", \"castle_friedrich.png\", 971, 60],\n", " [\n", " \"Masterpiece of watercolor painting in Van Gogh style\",\n", " \"cat.png\",\n", " 37890,\n", " 40,\n", " ],\n", " ],\n", " [positive_input, i2i_input, seed_input, steps_input],\n", " )\n", "\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/\n", "# if you want create public link for sharing demo, please add share=True\n", "demo.launch()" ] }, { "cell_type": "code", "execution_count": null, "id": "3cc2a9d6-4a39-4690-8089-fd47aecffea0", "metadata": {}, "outputs": [], "source": [ "demo.close()\n", "del image2image_pipe\n", "gc.collect()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1eca2245-c403-41cb-bf5b-0cc4acfe397e", "metadata": {}, "source": [ "## SDXL Refiner model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "As we discussed above, Stable Diffusion XL can be used in a 2-stages approach: first, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model for the refinement of latents generated in the first step, using the same prompt. \n", "The Stable Diffusion XL Refiner model is designed to transform regular images into stunning masterpieces with the help of user-specified prompt text. It can be used to improve the quality of image generation after the Stable Diffusion XL Base. The refiner model accepts latents produced by the SDXL base model and text prompt for improving generated image." ] }, { "cell_type": "markdown", "id": "dd1d6821", "metadata": {}, "source": [ "select whether you would like to use weight compression to reduce memory footprint" ] }, { "cell_type": "code", "execution_count": null, "id": "aa09681b", "metadata": {}, "outputs": [], "source": [ "compress_weights" ] }, { "cell_type": "code", "execution_count": null, "id": "cbdf5c54", "metadata": {}, "outputs": [], "source": [ "quantization_config = get_quantization_config(compress_weights)" ] }, { "cell_type": "code", "execution_count": null, "id": "c8b95e61-d266-4491-8dfc-d2c8f56093a8", "metadata": {}, "outputs": [], "source": [ "from optimum.intel import (\n", " OVStableDiffusionXLImg2ImgPipeline,\n", " OVStableDiffusionXLPipeline,\n", ")\n", "from pathlib import Path\n", "\n", "refiner_model_id = \"stabilityai/stable-diffusion-xl-refiner-1.0\"\n", "refiner_model_dir = Path(\"openvino-sd-xl-refiner-1.0\")\n", "\n", "\n", "if not refiner_model_dir.exists():\n", " refiner = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(refiner_model_id, export=True, compile=False, quantization_config=quantization_config)\n", " refiner.half()\n", " refiner.save_pretrained(refiner_model_dir)\n", " del refiner\n", " gc.collect()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "378664aa-b41c-4ecb-854a-9b2ebb0964e7", "metadata": {}, "source": [ "### Select inference device\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 14, "id": "7c672d74-b566-42dc-8508-df399d1e5a3a", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "078aaf84b3c34bae857c58a6aaea6244", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "device" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0d347c7a-ac71-461b-a9ce-5f9471cb5c97", "metadata": {}, "source": [ "### Run Text2Image generation with Refinement\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "0048e46b-201c-4f16-88b3-fa621d1b6e14", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "01302fc565244fa9968df4697cc3817a", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/15 [00:00" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image = refiner(\n", " prompt=prompt,\n", " image=np.transpose(latents[None, :], (0, 2, 3, 1)),\n", " num_inference_steps=15,\n", " generator=np.random.RandomState(314),\n", ").images[0]\n", "image.save(\"cat_refined.png\")\n", "\n", "image" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/stable-diffusion-xl/stable-diffusion-xl.png?raw=true", "tags": { "categories": [ "Model Demos", "AI Trends" ], "libraries": [], "other": [ "Stable Diffusion" ], "tasks": [ "Text-to-Image" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }