{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "65afd43d-5505-4821-9e31-cb847a66bb0f", "metadata": {}, "source": [ "# High-resolution image generation with Segmind-VegaRT and OpenVINO\n", "\n", "The [Segmind Vega](https://huggingface.co/segmind/Segmind-Vega) Model is a distilled version of the [Stable Diffusion XL (SDXL)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), offering a remarkable 70% reduction in size and an impressive speedup while retaining high-quality text-to-image generation capabilities. Segmind Vega marks a significant milestone in the realm of text-to-image models, setting new standards for efficiency and speed. Engineered with a compact yet powerful design, it boasts only 745 million parameters. This streamlined architecture not only makes it the smallest in its class but also ensures lightning-fast performance, surpassing the capabilities of its predecessors. Vega represents a breakthrough in model optimization. Its compact size, compared to the 859 million parameters of the SD 1.5 and the hefty 2.6 billion parameters of SDXL, maintains a commendable balance between size and performance. Vega's ability to deliver high-quality images rapidly makes it a game-changer in the field, offering an unparalleled blend of speed, efficiency, and precision.\n", "\n", "Segmind Vega is a symmetrical, distilled version of the SDXL model; it is over 70% smaller and ~100% faster. The Down Block contains 247 million parameters, the Mid Block has 31 million, and the Up Block has 460 million. Apart from the size difference, the architecture is virtually identical to that of SDXL, ensuring compatibility with existing interfaces requiring no or minimal adjustments. Although smaller than the SD1.5 Model, Vega supports higher-resolution generation due to the SDXL architecture, making it an ideal replacement for [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)\n", "\n", "Segmind VegaRT is a distilled LCM-LoRA adapter for the Vega model, that allowed us to reduce the number of inference steps required to generate a good quality image to somewhere between 2 - 8 steps. Latent Consistency Model (LCM) LoRA was proposed in [LCM-LoRA: A universal Stable-Diffusion Acceleration Module](https://arxiv.org/abs/2311.05556) by Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu et al.\n", "\n", "More details about models can be found in [Segmind blog post](https://blog.segmind.com/segmind-vega/)\n", "\n", "In this tutorial, we explore how to run and optimize Segmind-VegaRT with OpenVINO. We will use a pre-trained model from the [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index) library. To simplify the user experience, the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library is used to convert the models to OpenVINO™ IR format. Additionally, we demonstrate how to improve pipeline latency with the quantization UNet model using [NNCF](https://github.com/openvinotoolkit/nncf).\n", "\n", "\n", "\n", "\n", "#### Table of contents:\n", "\n", "- [Prerequisites](#Prerequisites)\n", "- [Prepare PyTorch model](#Prepare-PyTorch-model)\n", "- [Convert model to OpenVINO format](#Convert-model-to-OpenVINO-format)\n", "- [Text-to-image generation](#Text-to-image-generation)\n", " - [Select inference device for text-to-image generation](#Select-inference-device-for-text-to-image-generation)\n", "- [Quantization](#Quantization)\n", " - [Prepare calibration dataset](#Prepare-calibration-dataset)\n", " - [Run quantization](#Run-quantization)\n", " - [Compare UNet file size](#Compare-UNet-file-size)\n", " - [Compare the inference time of the FP16 and INT8 models](#Compare-the-inference-time-of-the-FP16-and-INT8-models)\n", "- [Interactive Demo](#Interactive-Demo)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "764bf0d9-bce1-42fa-9aca-480bb5e4ac56", "metadata": {}, "source": [ "## Prerequisites\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 1, "id": "543b410c-e216-477a-a68e-25c041f867f3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mWARNING: Skipping openvino-dev as it is not installed.\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Skipping openvino as it is not installed.\u001b[0m\u001b[33m\n", "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu\\\n", "\"torch>=2.1\" transformers diffusers \"git+https://github.com/huggingface/optimum-intel.git\" \"gradio>=4.19\" \"openvino>=2023.3.0\" \"peft==0.6.2\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "08490c11-6028-4a29-9261-a80ad6ab8e51", "metadata": {}, "source": [ "## Prepare PyTorch model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "For preparing Segmind-VegaRT model for inference, we should create Segmind-Vega pipeline first. After that, for enabling Latent Consistency Model capability, we should integrate VegaRT LCM adapter using `add_lora_weights` method and replace scheduler with LCMScheduler.\n", "For simplification of these steps for next notebook running, we save created pipeline on disk." ] }, { "cell_type": "code", "execution_count": 2, "id": "53a9b6c4-9f29-441c-982f-f12a64329fcd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-01-24 14:12:38.551058: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-01-24 14:12:38.591203: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2024-01-24 14:12:39.344351: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" ] } ], "source": [ "import torch\n", "from diffusers import LCMScheduler, AutoPipelineForText2Image\n", "import gc\n", "from pathlib import Path\n", "\n", "model_id = \"segmind/Segmind-Vega\"\n", "adapter_id = \"segmind/Segmind-VegaRT\"\n", "pt_model_dir = Path(\"segmind-vegart\")\n", "\n", "if not pt_model_dir.exists():\n", " pipe = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch.float16, variant=\"fp16\")\n", " pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)\n", " pipe.load_lora_weights(adapter_id)\n", " pipe.fuse_lora()\n", "\n", " pipe.save_pretrained(\"segmind-vegart\")\n", " del pipe\n", " gc.collect()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "039c47b4-22bf-4f2f-b07b-c23b416f777c", "metadata": {}, "source": [ "## Convert model to OpenVINO format\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "We will use optimum-cli interface for exporting it into OpenVINO Intermediate Representation (IR) format.\n", "\n", " Optimum CLI interface for converting models supports export to OpenVINO (supported starting optimum-intel 1.12 version).\n", "General command format:\n", "\n", "```bash\n", "optimum-cli export openvino --model --task \n", "```\n", "\n", "where task is task to export the model for, if not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, as Segmind-Vega uses interface compatible with SDXL, we should be selected `stable-diffusion-xl` \n", "\n", "You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).\n", "\n", "Additionally, you can specify weights compression `--fp16` for the compression model to FP16 and `--int8` for the compression model to INT8. Please note, that for INT8, it is necessary to install nncf.\n", "\n", "Full list of supported arguments available via `--help`\n", "For more details and examples of usage, please check [optimum documentation](https://huggingface.co/docs/optimum/intel/inference#export).\n", "\n", "For Tiny Autoencoder, we will use `ov.convert_model` function for obtaining `ov.Model` and save it using `ov.save_model`. Model consists of 2 parts that used in pipeline separately:\n", "`vae_encoder` for encoding input image in latent space in image-to-image generation task and `vae_decoder` that responsible for decoding diffusion result back to image format." ] }, { "cell_type": "code", "execution_count": 3, "id": "ccc14074-13d2-4b93-bfc0-aae7961dde6b", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "model_dir = Path(\"openvino-segmind-vegart\")\n", "sdxl_model_id = \"./segmind-vegart\"\n", "tae_id = \"madebyollin/taesdxl\"\n", "skip_convert_model = model_dir.exists()" ] }, { "cell_type": "code", "execution_count": 4, "id": "d8c0e641-aae9-41c0-adbb-140db323d122", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import openvino as ov\n", "from diffusers import AutoencoderTiny\n", "import gc\n", "\n", "\n", "class VAEEncoder(torch.nn.Module):\n", " def __init__(self, vae):\n", " super().__init__()\n", " self.vae = vae\n", "\n", " def forward(self, sample):\n", " return self.vae.encode(sample)\n", "\n", "\n", "class VAEDecoder(torch.nn.Module):\n", " def __init__(self, vae):\n", " super().__init__()\n", " self.vae = vae\n", "\n", " def forward(self, latent_sample):\n", " return self.vae.decode(latent_sample)\n", "\n", "\n", "def convert_tiny_vae(model_id, output_path):\n", " tiny_vae = AutoencoderTiny.from_pretrained(model_id)\n", " tiny_vae.eval()\n", " vae_encoder = VAEEncoder(tiny_vae)\n", " ov_model = ov.convert_model(vae_encoder, example_input=torch.zeros((1, 3, 512, 512)))\n", " ov.save_model(ov_model, output_path / \"vae_encoder/openvino_model.xml\")\n", " tiny_vae.save_config(output_path / \"vae_encoder\")\n", " vae_decoder = VAEDecoder(tiny_vae)\n", " ov_model = ov.convert_model(vae_decoder, example_input=torch.zeros((1, 4, 64, 64)))\n", " ov.save_model(ov_model, output_path / \"vae_decoder/openvino_model.xml\")\n", " tiny_vae.save_config(output_path / \"vae_decoder\")\n", " del tiny_vae\n", " del ov_model\n", " gc.collect()\n", "\n", "\n", "if not skip_convert_model:\n", " !optimum-cli export openvino --model $sdxl_model_id --task stable-diffusion-xl $model_dir --fp16\n", " convert_tiny_vae(tae_id, model_dir)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c42628f4-683c-4621-bebd-8bc062351530", "metadata": {}, "source": [ "## Text-to-image generation\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Text-to-image generation lets you create images using text description. To start generating images, we need to load models first.\n", "To load an OpenVINO model and run an inference with Optimum and OpenVINO Runtime, you need to replace diffusers `StableDiffusionXLPipeline` with Optimum `OVStableDiffusionXLPipeline`. Pipeline initialization starts with using `from_pretrained` method, where a directory with OpenVINO models should be passed. Additionally, you can specify an inference device. \n", "\n", "For saving time, we will not cover image-to-image generation in this notebook. As we already mentioned, Segmind-Vega is compatible with Stable Diffusion XL pipeline, the steps required to run Stable Diffusion XL inference for image-to-image task were discussed in this [notebook](stable-dffision-xl.ipynb)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "a668dcfe-46cb-424d-85da-bd7911669eda", "metadata": {}, "source": [ "### Select inference device for text-to-image generation\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 5, "id": "87be2787-e707-4404-a87a-3d145e9bf535", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "906e4b78c0d5439c90052c0b2dc8ec43", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=3, options=('CPU', 'GPU.0', 'GPU.1', 'AUTO'), value='AUTO')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets as widgets\n", "\n", "core = ov.Core()\n", "\n", "device = widgets.Dropdown(\n", " options=core.available_devices + [\"AUTO\"],\n", " value=\"AUTO\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "cell_type": "code", "execution_count": 6, "id": "da199a03-9ef8-4ff7-916f-b6ca8e637f02", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "The config attributes {'interpolation_type': 'linear', 'skip_prk_steps': True, 'use_karras_sigmas': False} were passed to LCMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.\n", "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n" ] } ], "source": [ "from optimum.intel.openvino import OVStableDiffusionXLPipeline\n", "\n", "text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)" ] }, { "cell_type": "code", "execution_count": 7, "id": "f453e068-589a-44fa-9ec4-8d425ae6fbd3", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c2ba05a1535f41349d6fa9df543e4128", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from transformers import set_seed\n", "\n", "set_seed(23)\n", "\n", "prompt = \"A cinematic highly detailed shot of a baby Yorkshire terrier wearing an intricate Italian priest robe, with crown\"\n", "image = text2image_pipe(prompt, num_inference_steps=4, height=512, width=512, guidance_scale=0.5).images[0]\n", "image.save(\"dog.png\")\n", "image" ] }, { "cell_type": "code", "execution_count": 8, "id": "ecc85cf4-88c8-49ba-86ca-c3189c16ffa9", "metadata": {}, "outputs": [], "source": [ "del text2image_pipe\n", "gc.collect();" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2f9e3f44-ac65-4ed9-ab60-3b530c836471", "metadata": {}, "source": [ "## Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n", "\n", "According to `Segmind-VEGAModel` structure, the UNet model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the SDXL pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy.\n", "\n", "The optimization process contains the following steps:\n", "\n", "1. Create a calibration dataset for quantization.\n", "2. Run `nncf.quantize()` to obtain quantized model.\n", "3. Save the `INT8` model using `openvino.save_model()` function.\n", "\n", "Please select below whether you would like to run quantization to improve model inference speed." ] }, { "cell_type": "code", "execution_count": 9, "id": "4395e9b7-3b22-4c5c-b3aa-4f87854554c8", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a04e1f9584db4559b261251a0b747feb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Quantization')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "to_quantize = widgets.Checkbox(\n", " value=True,\n", " description=\"Quantization\",\n", " disabled=False,\n", ")\n", "\n", "to_quantize" ] }, { "cell_type": "code", "execution_count": 10, "id": "7d539669-d2a5-4cf0-8a26-a77278b69cf4", "metadata": {}, "outputs": [], "source": [ "# Fetch `skip_kernel_extension` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py\",\n", ")\n", "open(\"skip_kernel_extension.py\", \"w\").write(r.text)\n", "\n", "int8_pipe = None\n", "\n", "if to_quantize.value and \"GPU\" in device.value:\n", " to_quantize.value = False\n", "42\n", "%load_ext skip_kernel_extension" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2f2eff61-6e68-4d02-9d64-afc4e635482c", "metadata": {}, "source": [ "### Prepare calibration dataset\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "We use a portion of [conceptual_captions](https://huggingface.co/datasets/conceptual_captions) dataset from Hugging Face as calibration data.\n", "To collect intermediate model inputs for calibration we should customize `CompiledModel`." ] }, { "cell_type": "code", "execution_count": 11, "id": "b5a68751-5887-4040-9ea5-cf464af0e5e6", "metadata": {}, "outputs": [], "source": [ "UNET_INT8_OV_PATH = model_dir / \"optimized_unet\" / \"openvino_model.xml\"\n", "\n", "\n", "def disable_progress_bar(pipeline, disable=True):\n", " if not hasattr(pipeline, \"_progress_bar_config\"):\n", " pipeline._progress_bar_config = {\"disable\": disable}\n", " else:\n", " pipeline._progress_bar_config[\"disable\"] = disable" ] }, { "cell_type": "code", "execution_count": 12, "id": "1bdced6a-3030-45ef-801a-42b26fed7504", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import datasets\n", "import numpy as np\n", "from tqdm.notebook import tqdm\n", "from transformers import set_seed\n", "from typing import Any, Dict, List\n", "\n", "set_seed(1)\n", "\n", "class CompiledModelDecorator(ov.CompiledModel):\n", " def __init__(self, compiled_model: ov.CompiledModel, data_cache: List[Any] = None):\n", " super().__init__(compiled_model)\n", " self.data_cache = data_cache if data_cache else []\n", "\n", " def __call__(self, *args, **kwargs):\n", " self.data_cache.append(*args)\n", " return super().__call__(*args, **kwargs)\n", "\n", "def collect_calibration_data(pipe, subset_size: int) -> List[Dict]:\n", " original_unet = pipe.unet.request\n", " pipe.unet.request = CompiledModelDecorator(original_unet)\n", "\n", " dataset = datasets.load_dataset(\"conceptual_captions\", split=\"train\").shuffle(seed=42)\n", " disable_progress_bar(pipe)\n", "\n", " # Run inference for data collection\n", " pbar = tqdm(total=subset_size)\n", " diff = 0\n", " for batch in dataset:\n", " prompt = batch[\"caption\"]\n", " if len(prompt) > pipe.tokenizer.model_max_length:\n", " continue\n", " _ = pipe(\n", " prompt,\n", " num_inference_steps=1,\n", " height=512,\n", " width=512,\n", " guidance_scale=0.0,\n", " generator=np.random.RandomState(987)\n", " )\n", " collected_subset_size = len(pipe.unet.request.data_cache)\n", " if collected_subset_size >= subset_size:\n", " pbar.update(subset_size - pbar.n)\n", " break\n", " pbar.update(collected_subset_size - diff)\n", " diff = collected_subset_size\n", "\n", " calibration_dataset = pipe.unet.request.data_cache\n", " disable_progress_bar(pipe, disable=False)\n", " pipe.unet.request = original_unet\n", " return calibration_dataset" ] }, { "cell_type": "code", "execution_count": 13, "id": "029cec2e-492c-4682-b9c4-07d772bb5985", "metadata": {"test_replace": {"subset_size=200": "subset_size=10"}}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "if not UNET_INT8_OV_PATH.exists():\n", " text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", " unet_calibration_data = collect_calibration_data(text2image_pipe, subset_size=200)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6af8ab45-8a59-49b9-b132-534140637ceb", "metadata": {}, "source": [ "### Run quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Create a quantized model from the pre-trained converted OpenVINO model. Quantization of the first and last `Convolution` layers impacts the generation results. We recommend using `IgnoredScope` to keep accuracy sensitive `Convolution` layers in FP16 precision.\n", "\n", "> **NOTE**: Quantization is time and memory consuming operation. Running quantization code below may take some time." ] }, { "cell_type": "code", "execution_count": 14, "id": "e78f2837-4912-4ed4-a93f-6d278fd0a14a", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import nncf\n", "from nncf.scopes import IgnoredScope\n", "\n", "UNET_OV_PATH = model_dir / \"unet\" / \"openvino_model.xml\"\n", "if not UNET_INT8_OV_PATH.exists():\n", " unet = core.read_model(UNET_OV_PATH)\n", " quantized_unet = nncf.quantize(\n", " model=unet,\n", " model_type=nncf.ModelType.TRANSFORMER,\n", " calibration_dataset=nncf.Dataset(unet_calibration_data),\n", " ignored_scope=IgnoredScope(\n", " names=[\n", " \"__module.model.conv_in/aten::_convolution/Convolution\",\n", " \"__module.model.up_blocks.2.resnets.2.conv_shortcut/aten::_convolution/Convolution\",\n", " \"__module.model.conv_out/aten::_convolution/Convolution\"\n", " ],\n", " ),\n", " )\n", " ov.save_model(quantized_unet, UNET_INT8_OV_PATH)" ] }, { "cell_type": "code", "execution_count": 15, "id": "50abd836-5fab-4ffd-850a-fdd693455f02", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The config attributes {'interpolation_type': 'linear', 'skip_prk_steps': True, 'use_karras_sigmas': False} were passed to LCMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.\n", "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c0c8c922d60d4be9954b4162ca873039", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "def create_int8_pipe(model_dir, unet_int8_path, device, core, unet_device='CPU'):\n", " int8_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device, compile=True)\n", " del int8_pipe.unet.request\n", " del int8_pipe.unet.model\n", " gc.collect()\n", " int8_pipe.unet.model = core.read_model(unet_int8_path)\n", " int8_pipe.unet.request = core.compile_model(int8_pipe.unet.model, unet_device or device)\n", " return int8_pipe\n", "\n", "int8_text2image_pipe = create_int8_pipe(model_dir, UNET_INT8_OV_PATH, device.value, core)\n", "\n", "\n", "set_seed(23)\n", " \n", "image = int8_text2image_pipe(prompt, num_inference_steps=4, height=512, width=512, guidance_scale=0.5).images[0]\n", "display(image)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e637f62c-8bef-4e04-9c8b-fad2acc83c9f", "metadata": {}, "source": [ "#### Compare UNet file size\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 16, "id": "6c1fc2a1-e375-484e-9935-7dea040088fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FP16 model size: 1455519.49 KB\n", "INT8 model size: 729448.00 KB\n", "Model compression rate: 1.995\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "fp16_ir_model_size = UNET_OV_PATH.with_suffix(\".bin\").stat().st_size / 1024\n", "quantized_model_size = UNET_INT8_OV_PATH.with_suffix(\".bin\").stat().st_size / 1024\n", "\n", "print(f\"FP16 model size: {fp16_ir_model_size:.2f} KB\")\n", "print(f\"INT8 model size: {quantized_model_size:.2f} KB\")\n", "print(f\"Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b971215a-6b81-4a90-9a23-1514c6649e27", "metadata": {}, "source": [ "### Compare the inference time of the FP16 and INT8 models\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "To measure the inference performance of the `FP16` and `INT8` pipelines, we use median inference time on the calibration subset.\n", "\n", "> **NOTE**: For the most accurate performance estimation, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications." ] }, { "cell_type": "code", "execution_count": 17, "id": "c606f77b-fce4-4383-98fe-e1b5e6d1c99e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.\n", " table = cls._concat_blocks(blocks, axis=0)\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "import time\n", "\n", "validation_size = 7\n", "calibration_dataset = datasets.load_dataset(\"conceptual_captions\", split=\"train\")\n", "validation_data = []\n", "for idx, batch in enumerate(calibration_dataset):\n", " if idx >= validation_size:\n", " break\n", " prompt = batch[\"caption\"]\n", " validation_data.append(prompt)\n", "\n", "def calculate_inference_time(pipe, dataset):\n", " inference_time = []\n", " disable_progress_bar(pipe)\n", "\n", " for prompt in dataset:\n", " start = time.perf_counter()\n", " image = pipe(\n", " prompt,\n", " num_inference_steps=4,\n", " guidance_scale=1.0,\n", " generator=np.random.RandomState(23)\n", " ).images[0]\n", " end = time.perf_counter()\n", " delta = end - start\n", " inference_time.append(delta)\n", " disable_progress_bar(pipe, disable=False)\n", " return np.median(inference_time)" ] }, { "cell_type": "code", "execution_count": 18, "id": "0eefebde-e549-4859-961d-9e13f5d1193f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The config attributes {'interpolation_type': 'linear', 'skip_prk_steps': True, 'use_karras_sigmas': False} were passed to LCMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.\n", "Compiling the vae_decoder to AUTO ...\n", "Compiling the unet to AUTO ...\n", "Compiling the text_encoder to AUTO ...\n", "Compiling the text_encoder_2 to AUTO ...\n", "Compiling the vae_encoder to AUTO ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "FP16 pipeline latency: 11.029\n", "INT8 pipeline latency: 5.967\n", "Text-to-Image generation speed up: 1.849\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "int8_latency = calculate_inference_time(int8_text2image_pipe, validation_data)\n", "\n", "del int8_text2image_pipe\n", "gc.collect()\n", "\n", "text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", "fp_latency = calculate_inference_time(text2image_pipe, validation_data)\n", "\n", "del text2image_pipe\n", "gc.collect()\n", "print(f\"FP16 pipeline latency: {fp_latency:.3f}\")\n", "print(f\"INT8 pipeline latency: {int8_latency:.3f}\")\n", "print(f\"Text-to-Image generation speed up: {fp_latency / int8_latency:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9fa56c98-dfc0-4586-b43a-1a2af46fc344", "metadata": {}, "source": [ "## Interactive Demo\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "\n", "Now, you can check model work using own text descriptions. Provide text prompt in the text box and launch generation using Run button. Additionally you can control generation with additional parameters:\n", "* Seed - random seed for initialization\n", "* Steps - number of generation steps\n", "* Height and Width - size of generated image\n", "\n", "Please select below whether you would like to use the quantized model to launch the interactive demo." ] }, { "cell_type": "code", "execution_count": 19, "id": "c906fc2a-3ff0-49c4-8dec-fd8b6d552aaa", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a704580a2b1b422da26add535f908585", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Use quantized model')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantized_model_present = UNET_INT8_OV_PATH.exists()\n", "\n", "use_quantized_model = widgets.Checkbox(\n", " value=quantized_model_present,\n", " description=\"Use quantized model\",\n", " disabled=not quantized_model_present,\n", ")\n", "\n", "use_quantized_model" ] }, { "cell_type": "code", "execution_count": null, "id": "b6133399-4a47-4590-b6cd-26374c533402", "metadata": {}, "outputs": [], "source": [ "import gradio as gr\n", "\n", "if use_quantized_model.value:\n", " if not quantized_model_present:\n", " raise RuntimeError(\"Quantized model not found.\")\n", " text2image_pipe = create_int8_pipe(model_dir, UNET_INT8_OV_PATH, device.value, core)\n", "\n", "else:\n", " text2image_pipe = OVStableDiffusionXLPipeline.from_pretrained(model_dir, device=device.value)\n", "\n", "\n", "def generate_from_text(text, seed, num_steps, height, width):\n", " set_seed(seed)\n", " result = text2image_pipe(\n", " text,\n", " num_inference_steps=num_steps,\n", " guidance_scale=1.0,\n", " height=height,\n", " width=width,\n", " ).images[0]\n", " return result\n", "\n", "\n", "with gr.Blocks() as demo:\n", " with gr.Column():\n", " positive_input = gr.Textbox(label=\"Text prompt\")\n", " with gr.Row():\n", " seed_input = gr.Number(precision=0, label=\"Seed\", value=42, minimum=0)\n", " steps_input = gr.Slider(label=\"Steps\", value=4, minimum=2, maximum=8, step=1)\n", " height_input = gr.Slider(label=\"Height\", value=512, minimum=256, maximum=1024, step=32)\n", " width_input = gr.Slider(label=\"Width\", value=512, minimum=256, maximum=1024, step=32)\n", " btn = gr.Button()\n", " out = gr.Image(\n", " label=(\"Result (Quantized)\" if use_quantized_model.value else \"Result (Original)\"),\n", " type=\"pil\",\n", " width=512,\n", " )\n", " btn.click(\n", " generate_from_text,\n", " [positive_input, seed_input, steps_input, height_input, width_input],\n", " out,\n", " )\n", " gr.Examples(\n", " [\n", " [\"cute cat\", 999],\n", " [\n", " \"underwater world coral reef, colorful jellyfish, 35mm, cinematic lighting, shallow depth of field, ultra quality, masterpiece, realistic\",\n", " 89,\n", " ],\n", " [\n", " \"a photo realistic happy white poodle dog ​​playing in the grass, extremely detailed, high res, 8k, masterpiece, dynamic angle\",\n", " 1569,\n", " ],\n", " [\n", " \"Astronaut on Mars watching sunset, best quality, cinematic effects,\",\n", " 65245,\n", " ],\n", " [\n", " \"Black and white street photography of a rainy night in New York, reflections on wet pavement\",\n", " 48199,\n", " ],\n", " [\n", " \"cinematic photo detailed closeup portraid of a Beautiful cyberpunk woman, robotic parts, cables, lights, text; , high quality photography, 3 point lighting, flash with softbox, 4k, Canon EOS R3, hdr, smooth, sharp focus, high resolution, award winning photo, 80mm, f2.8, bokeh . 35mm photograph, film, bokeh, professional, 4k, highly detailed, high quality photography, 3 point lighting, flash with softbox, 4k, Canon EOS R3, hdr, smooth, sharp focus, high resolution, award winning photo, 80mm, f2.8, bokeh\",\n", " 48199,\n", " ],\n", " ],\n", " [positive_input, seed_input],\n", " )\n", "\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/\n", "# if you want create public link for sharing demo, please add share=True\n", "try:\n", " demo.launch(debug=True)\n", "except Exception:\n", " demo.launch(share=True, debug=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/stable-diffusion-xl/stable-diffusion-xl.png?raw=true", "tags": { "categories": [ "Model Demos", "AI Trends" ], "libraries": [], "other": [ "Stable Diffusion" ], "tasks": [ "Text-to-Image" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }