{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "dbf2a579-a1bb-4293-980d-74b7b8b43c60", "metadata": {}, "source": [ "# Single step image generation using SDXL-turbo and OpenVINO\n", "\n", "SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. SDXL-Turbo is a distilled version of [SDXL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), trained for real-time synthesis. \n", "SDXL Turbo is based on a novel distillation technique called Adversarial Diffusion Distillation (ADD), which enables the model to synthesize image outputs in a single step and generate real-time text-to-image outputs while maintaining high sampling fidelity. More details about this distillation approach can be found in [technical report](https://stability.ai/research/adversarial-diffusion-distillation). More details about model can be found in [Stability AI blog post](https://stability.ai/news/stability-ai-sdxl-turbo).\n", "\n", "Previously, we already discussed how to launch Stable Diffusion XL model using OpenVINO in the following [notebook](../stable-diffusion-xl), in this tutorial we will focus on the [SDXL-turbo](https://huggingface.co/stabilityai/sdxl-turbo) version. Additionally, to improve image decoding speed, we will use [Tiny Autoencoder](https://github.com/madebyollin/taesd), which is useful for real-time previewing of the SDXL generation process.\n", "\n", "We will use a pre-trained model from the [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index) library. To simplify the user experience, the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library is used to convert the models to OpenVINO™ IR format.\n", "\n", "#### Table of contents:\n", "\n", "- [Prerequisites](#Prerequisites)\n", "- [Convert model to OpenVINO format](#Convert-model-to-OpenVINO-format)\n", "- [Text-to-image generation](#Text-to-image-generation)\n", " - [Select inference device for text-to-image generation](#Select-inference-device-for-text-to-image-generation)\n", "- [Image-to-Image generation](#Image-to-Image-generation)\n", "- [Quantization](#Quantization)\n", " - [Prepare calibration dataset](#Prepare-calibration-dataset)\n", " - [Run quantization](#Run-quantization)\n", " - [Compare UNet file size](#Compare-UNet-file-size)\n", " - [Compare inference time of the FP16 and INT8 models](#Compare-inference-time-of-the-FP16-and-INT8-models)\n", "- [Interactive Demo](#Interactive-Demo)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7a6f6632-e595-4b3e-8bc9-fa2596e7626a", "metadata": {}, "source": [ "## Prerequisites\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "935d70a8-1f2d-4c0d-8933-cbd074712ce5", "metadata": {}, "outputs": [], "source": [ "%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu\\\n", "\"torch>=2.1\" transformers diffusers \"git+https://github.com/huggingface/optimum-intel.git\" \"gradio>=4.19\" \"peft==0.6.2\" \"openvino>=2023.3.0\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e0e0a0ab-f69a-4f98-8821-d112f4cb042c", "metadata": {}, "source": [ "## Convert model to OpenVINO format\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "[sdxl-turbo](https://huggingface.co/stabilityai/sdxl-turbo) is available for downloading via the [HuggingFace hub](https://huggingface.co/models). We will use optimum-cli interface for exporting it into OpenVINO Intermediate Representation (IR) format.\n", "\n", " Optimum CLI interface for converting models supports export to OpenVINO (supported starting optimum-intel 1.12 version).\n", "General command format:\n", "\n", "```bash\n", "optimum-cli export openvino --model --task \n", "```\n", "\n", "where task is task to export the model for, if not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, for sdxl should be selected `stable-diffusion-xl` \n", "\n", "You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).\n", "\n", "Additionally, you can specify weights compression `--fp16` for the compression model to FP16 and `--int8` for the compression model to INT8. Please note, that for INT8, it is necessary to install nncf.\n", "\n", "Full list of supported arguments available via `--help`\n", "For more details and examples of usage, please check [optimum documentation](https://huggingface.co/docs/optimum/intel/inference#export).\n", "\n", "For Tiny Autoencoder, we will use `ov.convert_model` function for obtaining `ov.Model` and save it using `ov.save_model`. Model consists of 2 parts that used in pipeline separately:\n", "`vae_encoder` for encoding input image in latent space in image-to-image generation task and `vae_decoder` that responsible for decoding diffusion result back to image format." ] }, { "cell_type": "code", "execution_count": 2, "id": "fb8d69a7-56ca-4acd-8aa9-9a9f31b58496", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "model_dir = Path(\"./model\")\n", "sdxl_model_id = \"stabilityai/sdxl-turbo\"\n", "tae_id = \"madebyollin/taesdxl\"\n", "skip_convert_model = model_dir.exists()" ] }, { "cell_type": "code", "execution_count": 3, "id": "e19f90d9-55d1-4e99-91c0-9f72e0240cf2", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import openvino as ov\n", "from diffusers import AutoencoderTiny\n", "import gc\n", "\n", "\n", "class VAEEncoder(torch.nn.Module):\n", " def __init__(self, vae):\n", " super().__init__()\n", " self.vae = vae\n", "\n", " def forward(self, sample):\n", " return self.vae.encode(sample)\n", "\n", "\n", "class VAEDecoder(torch.nn.Module):\n", " def __init__(self, vae):\n", " super().__init__()\n", " self.vae = vae\n", "\n", " def forward(self, latent_sample):\n", " return self.vae.decode(latent_sample)\n", "\n", "\n", "def convert_tiny_vae(model_id, output_path):\n", " tiny_vae = AutoencoderTiny.from_pretrained(model_id)\n", " tiny_vae.eval()\n", " vae_encoder = VAEEncoder(tiny_vae)\n", " ov_model = ov.convert_model(vae_encoder, example_input=torch.zeros((1, 3, 512, 512)))\n", " ov.save_model(ov_model, output_path / \"vae_encoder/openvino_model.xml\")\n", " tiny_vae.save_config(output_path / \"vae_encoder\")\n", " vae_decoder = VAEDecoder(tiny_vae)\n", " ov_model = ov.convert_model(vae_decoder, example_input=torch.zeros((1, 4, 64, 64)))\n", " ov.save_model(ov_model, output_path / \"vae_decoder/openvino_model.xml\")\n", " tiny_vae.save_config(output_path / \"vae_decoder\")\n", "\n", "\n", "if not skip_convert_model:\n", " !optimum-cli export openvino --model $sdxl_model_id --task stable-diffusion-xl $model_dir --fp16\n", " convert_tiny_vae(tae_id, model_dir)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b1c932ef-9cb7-49ec-b08d-33d9d74492c3", "metadata": {}, "source": [ "## Text-to-image generation\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Text-to-image generation lets you create images using text description. To start generating images, we need to load models first.\n", "To load an OpenVINO model and run an inference with Optimum and OpenVINO Runtime, you need to replace diffusers `StableDiffusionXLPipeline` with Optimum `OVStableDiffusionXLPipeline`. Pipeline initialization starts with using `from_pretrained` method, where a directory with OpenVINO models should be passed. Additionally, you can specify an inference device." ] }, { "attachments": {}, "cell_type": "markdown", "id": "149cd0d5-97ce-474b-a07e-4637f7ef0508", "metadata": {}, "source": [ "### Select inference device for text-to-image generation\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "2fe98f06-2183-446a-8e38-c475073ded26", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9021bf1c50bb4f81a357e543ce148425", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets as widgets\n", "\n", "core = ov.Core()\n", "\n", "device = widgets.Dropdown(\n", " options=core.available_devices + [\"AUTO\"],\n", " value=\"AUTO\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "cell_type": "code", "execution_count": 5, "id": "0bc47bd6-6571-4ff2-b111-b68af66777c3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/ea/work/genai_env/lib/python3.8/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)\n", " return torch._C._cuda_getDeviceCount() > 0\n", "No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'\n", "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n" ] } ], "source": [ "from optimum.intel.openvino import OVStableDiffusionXLPipeline\n", "\n", "text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "67a6df12-966d-49f6-8987-776b1d451e20", "metadata": {}, "source": [ "The pipeline interface is similar to original `StableDiffusionXLPipeline`. We should provide text prompt. The default number of steps is 50, while sdxl-turbo required only 1 step. According to the information provided in model card, model does not use negative prompt and guidance scale and this parameters should be disabled using `guidance_scale = 0`" ] }, { "cell_type": "code", "execution_count": 6, "id": "607474d1-2b7b-42be-a784-100685f055f2", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8046848989094cd991e1cd04bf00a063", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "prompt = \"cute cat\"\n", "image = text2image_pipe(\n", " prompt,\n", " num_inference_steps=1,\n", " height=512,\n", " width=512,\n", " guidance_scale=0.0,\n", " generator=np.random.RandomState(987),\n", ").images[0]\n", "image.save(\"cat.png\")\n", "image" ] }, { "cell_type": "code", "execution_count": 7, "id": "0436c4c7-fca0-4a19-b3b9-c2d6eaac3ea6", "metadata": {}, "outputs": [], "source": [ "del text2image_pipe\n", "gc.collect();" ] }, { "attachments": {}, "cell_type": "markdown", "id": "789676a9-8468-4dde-952c-748c9d35abe7", "metadata": {}, "source": [ "## Image-to-Image generation\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Image-to-image generation lets you transform images to match the characteristics provided in the text description. We can reuse the already converted model for running the Image2Image generation pipeline. For that, we should replace `OVStableDiffusionXLPipeline` with `OVStableDiffusionXLImage2ImagePipeline`." ] }, { "cell_type": "code", "execution_count": 8, "id": "75071ab3-ffc3-4de2-8edd-c8bcbd50f5b4", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n" ] } ], "source": [ "from optimum.intel import OVStableDiffusionXLImg2ImgPipeline\n", "\n", "image2image_pipe = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir, device=device.value)" ] }, { "cell_type": "code", "execution_count": 9, "id": "4b9269e1-5bd4-4d26-8ee8-0df35c4e53bc", "metadata": {}, "outputs": [], "source": [ "photo_prompt = \"a cute cat with bow tie\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d2bd1605-82fd-4be7-9384-db69641dcf0b", "metadata": {}, "source": [ "`strength` parameter is important for the image-to-image generation pipeline. It is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 enable lots of variations but will also produce images that are not semantically consistent with the input, then close to 0, less noise will be added and the target image will preserve source image content. strength has an impact not only on a number of noise but also the number of generation steps. The number of denoising iterations in the image-to-image generation pipeline is calculated as `int(num_inference_steps * strength)`. With sdxl-turbo we should be careful with selecting `num_inference_steps` and `strength` to produce the correct result and make sure that the number of steps used in pipeline >= 1 after applying strength multiplication. e.g. in example below, we will use `num_inference_steps=2` and `stength=0.5`, finally, we get 0.5 * 2.0 = 1 step in our pipeline." ] }, { "cell_type": "code", "execution_count": 10, "id": "9eace4a5-cdd1-44d2-aced-f21a944802eb", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "920e24ae4253462d92e0f5ba67c3dac8", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "photo_image = image2image_pipe(\n", " photo_prompt,\n", " image=image,\n", " num_inference_steps=2,\n", " generator=np.random.RandomState(511),\n", " guidance_scale=0.0,\n", " strength=0.5,\n", ").images[0]\n", "photo_image.save(\"cat_tie.png\")\n", "photo_image" ] }, { "cell_type": "code", "execution_count": 11, "id": "4d090210-f663-4a37-8819-f2f2b5c2534b", "metadata": {}, "outputs": [], "source": [ "del image2image_pipe\n", "gc.collect();" ] }, { "attachments": {}, "cell_type": "markdown", "id": "de0e80c7", "metadata": {}, "source": [ "## Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n", "\n", "According to `SDXL-Turbo Model` structure, the UNet model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the SDXL pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy.\n", "\n", "The optimization process contains the following steps:\n", "\n", "1. Create a calibration dataset for quantization.\n", "2. Run `nncf.quantize()` to obtain quantized model.\n", "3. Save the `INT8` model using `openvino.save_model()` function.\n", "\n", "Please select below whether you would like to run quantization to improve model inference speed." ] }, { "cell_type": "code", "execution_count": 12, "id": "b29be9c3", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "70eb01a9c0f04c22a45f5a596d312e2a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Quantization')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "skip_for_device = \"GPU\" in device.value\n", "to_quantize = widgets.Checkbox(value=not skip_for_device, description=\"Quantization\", disabled=skip_for_device)\n", "to_quantize" ] }, { "cell_type": "code", "execution_count": 13, "id": "e6fd26e3", "metadata": {}, "outputs": [], "source": [ "# Fetch `skip_kernel_extension` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py\",\n", ")\n", "open(\"skip_kernel_extension.py\", \"w\").write(r.text)\n", "\n", "int8_pipe = None\n", "\n", "%load_ext skip_kernel_extension" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f1624f75", "metadata": {}, "source": [ "### Prepare calibration dataset\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "We use a portion of [conceptual_captions](https://huggingface.co/datasets/conceptual_captions) dataset from Hugging Face as calibration data.\n", "To collect intermediate model inputs for calibration we should customize `CompiledModel`." ] }, { "cell_type": "code", "execution_count": 14, "id": "5b82d439", "metadata": {}, "outputs": [], "source": [ "UNET_INT8_OV_PATH = model_dir / \"optimized_unet\" / \"openvino_model.xml\"\n", "\n", "\n", "def disable_progress_bar(pipeline, disable=True):\n", " if not hasattr(pipeline, \"_progress_bar_config\"):\n", " pipeline._progress_bar_config = {\"disable\": disable}\n", " else:\n", " pipeline._progress_bar_config[\"disable\"] = disable" ] }, { "cell_type": "code", "execution_count": 15, "id": "22471a37", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import datasets\n", "import numpy as np\n", "from tqdm.notebook import tqdm\n", "from transformers import set_seed\n", "from typing import Any, Dict, List\n", "\n", "set_seed(1)\n", "\n", "class CompiledModelDecorator(ov.CompiledModel):\n", " def __init__(self, compiled_model: ov.CompiledModel, data_cache: List[Any] = None):\n", " super().__init__(compiled_model)\n", " self.data_cache = data_cache if data_cache else []\n", "\n", " def __call__(self, *args, **kwargs):\n", " self.data_cache.append(*args)\n", " return super().__call__(*args, **kwargs)\n", "\n", "def collect_calibration_data(pipe, subset_size: int) -> List[Dict]:\n", " original_unet = pipe.unet.request\n", " pipe.unet.request = CompiledModelDecorator(original_unet)\n", "\n", " dataset = datasets.load_dataset(\"conceptual_captions\", split=\"train\").shuffle(seed=42)\n", " disable_progress_bar(pipe)\n", "\n", " # Run inference for data collection\n", " pbar = tqdm(total=subset_size)\n", " diff = 0\n", " for batch in dataset:\n", " prompt = batch[\"caption\"]\n", " if len(prompt) > pipe.tokenizer.model_max_length:\n", " continue\n", " _ = pipe(\n", " prompt,\n", " num_inference_steps=1,\n", " height=512,\n", " width=512,\n", " guidance_scale=0.0,\n", " generator=np.random.RandomState(987)\n", " )\n", " collected_subset_size = len(pipe.unet.request.data_cache)\n", " if collected_subset_size >= subset_size:\n", " pbar.update(subset_size - pbar.n)\n", " break\n", " pbar.update(collected_subset_size - diff)\n", " diff = collected_subset_size\n", "\n", " calibration_dataset = pipe.unet.request.data_cache\n", " disable_progress_bar(pipe, disable=False)\n", " pipe.unet.request = original_unet\n", " return calibration_dataset" ] }, { "cell_type": "code", "execution_count": 16, "id": "6b62f498", "metadata": { "test_replace": { "subset_size=200": "subset_size=10" } }, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "if not UNET_INT8_OV_PATH.exists():\n", " text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", " unet_calibration_data = collect_calibration_data(text2image_pipe, subset_size=200)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cb73621d", "metadata": {}, "source": [ "### Run quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Create a quantized model from the pre-trained converted OpenVINO model. Quantization of the first and last `Convolution` layers impacts the generation results. We recommend using `IgnoredScope` to keep accuracy sensitive `Convolution` layers in FP16 precision.\n", "\n", "> **NOTE**: Quantization is time and memory consuming operation. Running quantization code below may take some time." ] }, { "cell_type": "code", "execution_count": 17, "id": "b112e91c", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import nncf\n", "from nncf.scopes import IgnoredScope\n", "\n", "UNET_OV_PATH = model_dir / \"unet\" / \"openvino_model.xml\"\n", "if not UNET_INT8_OV_PATH.exists():\n", " unet = core.read_model(UNET_OV_PATH)\n", " quantized_unet = nncf.quantize(\n", " model=unet,\n", " model_type=nncf.ModelType.TRANSFORMER,\n", " calibration_dataset=nncf.Dataset(unet_calibration_data),\n", " ignored_scope=IgnoredScope(\n", " names=[\n", " \"__module.model.conv_in/aten::_convolution/Convolution\",\n", " \"__module.model.up_blocks.2.resnets.2.conv_shortcut/aten::_convolution/Convolution\",\n", " \"__module.model.conv_out/aten::_convolution/Convolution\"\n", " ],\n", " ),\n", " )\n", " ov.save_model(quantized_unet, UNET_INT8_OV_PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4697d611", "metadata": {}, "source": [ "Let us check predictions with the quantized UNet using the same input data." ] }, { "cell_type": "code", "execution_count": 18, "id": "4381e145", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "844b0bf58330483ba9159d4d602ed8de", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "from IPython.display import display\n", "\n", "int8_text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value, compile=False)\n", "int8_text2image_pipe.unet.model = core.read_model(UNET_INT8_OV_PATH)\n", "int8_text2image_pipe.unet.request = None\n", "\n", "prompt = \"cute cat\"\n", "image = int8_text2image_pipe(prompt, num_inference_steps=1, height=512, width=512, guidance_scale=0.0, generator=np.random.RandomState(987)).images[0]\n", "display(image)" ] }, { "cell_type": "code", "execution_count": 19, "id": "3d46b9e9", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2f73c7f23fd5475fad040c797296a119", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "int8_image2image_pipe = OVStableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir, device=device.value, compile=False)\n", "int8_image2image_pipe.unet.model = core.read_model(UNET_INT8_OV_PATH)\n", "int8_image2image_pipe.unet.request = None\n", "\n", "photo_prompt = \"a cute cat with bow tie\"\n", "photo_image = int8_image2image_pipe(photo_prompt, image=image, num_inference_steps=2, generator=np.random.RandomState(511), guidance_scale=0.0, strength=0.5).images[0]\n", "display(photo_image)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f64423ef", "metadata": {}, "source": [ "#### Compare UNet file size\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 20, "id": "63fc61a9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FP16 model size: 5014578.27 KB\n", "INT8 model size: 2513541.44 KB\n", "Model compression rate: 1.995\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "fp16_ir_model_size = UNET_OV_PATH.with_suffix(\".bin\").stat().st_size / 1024\n", "quantized_model_size = UNET_INT8_OV_PATH.with_suffix(\".bin\").stat().st_size / 1024\n", "\n", "print(f\"FP16 model size: {fp16_ir_model_size:.2f} KB\")\n", "print(f\"INT8 model size: {quantized_model_size:.2f} KB\")\n", "print(f\"Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "848a816b", "metadata": {}, "source": [ "### Compare inference time of the FP16 and INT8 models\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "To measure the inference performance of the `FP16` and `INT8` pipelines, we use median inference time on calibration subset.\n", "\n", "> **NOTE**: For the most accurate performance estimation, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications." ] }, { "cell_type": "code", "execution_count": 21, "id": "914fcb4d", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import time\n", "\n", "validation_size = 7\n", "calibration_dataset = datasets.load_dataset(\"conceptual_captions\", split=\"train\")\n", "validation_data = []\n", "for batch in calibration_dataset:\n", " prompt = batch[\"caption\"]\n", " validation_data.append(prompt)\n", "\n", "def calculate_inference_time(pipe, dataset):\n", " inference_time = []\n", " disable_progress_bar(pipe)\n", "\n", " for idx, prompt in enumerate(dataset):\n", " start = time.perf_counter()\n", " image = pipe(\n", " prompt,\n", " num_inference_steps=1,\n", " guidance_scale=0.0,\n", " generator=np.random.RandomState(23)\n", " ).images[0]\n", " end = time.perf_counter()\n", " delta = end - start\n", " inference_time.append(delta)\n", " if idx >= validation_size:\n", " break\n", " disable_progress_bar(pipe, disable=False)\n", " return np.median(inference_time)" ] }, { "cell_type": "code", "execution_count": 22, "id": "e46cddac", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "FP16 pipeline latency: 1.391\n", "INT8 pipeline latency: 0.781\n", "Text-to-Image generation speed up: 1.780\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "int8_latency = calculate_inference_time(int8_text2image_pipe, validation_data)\n", "text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", "fp_latency = calculate_inference_time(text2image_pipe, validation_data)\n", "print(f\"FP16 pipeline latency: {fp_latency:.3f}\")\n", "print(f\"INT8 pipeline latency: {int8_latency:.3f}\")\n", "print(f\"Text-to-Image generation speed up: {fp_latency / int8_latency:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2e034411-d526-4b6a-b177-9c633947c76f", "metadata": {}, "source": [ "## Interactive Demo\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Now, you can check model work using own text descriptions. Provide text prompt in the text box and launch generation using Run button. Additionally you can control generation with additional parameters:\n", "* Seed - random seed for initialization\n", "* Steps - number of generation steps\n", "* Height and Width - size of generated image\n", "\n", "> Please note that increasing image size may require to increasing number of steps for accurate result. We recommend running 104x1024 resolution image generation using 4 steps.\n", "\n", "Please select below whether you would like to use the quantized model to launch the interactive demo." ] }, { "cell_type": "code", "execution_count": 23, "id": "bb0d3675", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bcfccb0efbb64a8aa5d3b9cadcda11be", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Use quantized model')" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantized_model_present = UNET_INT8_OV_PATH.exists()\n", "\n", "use_quantized_model = widgets.Checkbox(\n", " value=True if quantized_model_present else False,\n", " description=\"Use quantized model\",\n", " disabled=False,\n", ")\n", "\n", "use_quantized_model" ] }, { "cell_type": "code", "execution_count": null, "id": "8ae26e5e-c1e8-4ce1-9c6b-12f9ad2a49a0", "metadata": {}, "outputs": [], "source": [ "import gradio as gr\n", "\n", "text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", "if use_quantized_model.value:\n", " if not quantized_model_present:\n", " raise RuntimeError(\"Quantized model not found.\")\n", " text2image_pipe.unet.model = core.read_model(UNET_INT8_OV_PATH)\n", " text2image_pipe.unet.request = core.compile_model(text2image_pipe.unet.model, device.value)\n", "\n", "\n", "def generate_from_text(text, seed, num_steps, height, width):\n", " result = text2image_pipe(\n", " text,\n", " num_inference_steps=num_steps,\n", " guidance_scale=0.0,\n", " generator=np.random.RandomState(seed),\n", " height=height,\n", " width=width,\n", " ).images[0]\n", " return result\n", "\n", "\n", "with gr.Blocks() as demo:\n", " with gr.Column():\n", " positive_input = gr.Textbox(label=\"Text prompt\")\n", " with gr.Row():\n", " seed_input = gr.Number(precision=0, label=\"Seed\", value=42, minimum=0)\n", " steps_input = gr.Slider(label=\"Steps\", value=1, minimum=1, maximum=4, step=1)\n", " height_input = gr.Slider(label=\"Height\", value=512, minimum=256, maximum=1024, step=32)\n", " width_input = gr.Slider(label=\"Width\", value=512, minimum=256, maximum=1024, step=32)\n", " btn = gr.Button()\n", " out = gr.Image(\n", " label=(\"Result (Quantized)\" if use_quantized_model.value else \"Result (Original)\"),\n", " type=\"pil\",\n", " width=512,\n", " )\n", " btn.click(\n", " generate_from_text,\n", " [positive_input, seed_input, steps_input, height_input, width_input],\n", " out,\n", " )\n", " gr.Examples(\n", " [\n", " [\"cute cat\", 999],\n", " [\n", " \"underwater world coral reef, colorful jellyfish, 35mm, cinematic lighting, shallow depth of field, ultra quality, masterpiece, realistic\",\n", " 89,\n", " ],\n", " [\n", " \"a photo realistic happy white poodle dog ​​playing in the grass, extremely detailed, high res, 8k, masterpiece, dynamic angle\",\n", " 1569,\n", " ],\n", " [\n", " \"Astronaut on Mars watching sunset, best quality, cinematic effects,\",\n", " 65245,\n", " ],\n", " [\n", " \"Black and white street photography of a rainy night in New York, reflections on wet pavement\",\n", " 48199,\n", " ],\n", " ],\n", " [positive_input, seed_input],\n", " )\n", "\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/\n", "# if you want create public link for sharing demo, please add share=True\n", "try:\n", " demo.launch(debug=True)\n", "except Exception:\n", " demo.launch(share=True, debug=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/sdxl-turbo/sdxl-turbo.png?raw=true", "tags": { "categories": [ "Model Demos", "AI Trends" ], "libraries": [], "other": [ "Stable Diffusion" ], "tasks": [ "Text-to-Image", "Image-to-Image" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }