{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "82044cea-a95d-4812-89a2-bbb055ea1661", "metadata": {}, "source": [ "# Universal Segmentation with OneFormer and OpenVINO" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b6dbb8aa-c070-4e7d-8b7a-64588e7f0da4", "metadata": {}, "source": [ "This tutorial demonstrates how to use the [OneFormer](https://arxiv.org/abs/2211.06220) model from HuggingFace with OpenVINO. It describes how to download weights and create PyTorch model using Hugging Face transformers library, then convert model to OpenVINO Intermediate Representation format (IR) using OpenVINO Model Optimizer API and run model inference. Additionally, [NNCF](https://github.com/openvinotoolkit/nncf/) quantization is applied to improve OneFormer segmentation speed.\n", "\n", "![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/oneformer_architecture.png)\n", "\n", "OneFormer is a follow-up work of [Mask2Former](https://arxiv.org/abs/2112.01527). The latter still requires training on instance/semantic/panoptic datasets separately to get state-of-the-art results.\n", "\n", "OneFormer incorporates a text module in the Mask2Former framework, to condition the model on the respective subtask (instance, semantic or panoptic). This gives even more accurate results, but comes with a cost of increased latency, however." ] }, { "attachments": {}, "cell_type": "markdown", "id": "39d6e067", "metadata": {}, "source": [ "\n", "#### Table of contents:\n", "\n", "- [Install required libraries](#Install-required-libraries)\n", "- [Prepare the environment](#Prepare-the-environment)\n", "- [Load OneFormer fine-tuned on COCO for universal segmentation](#Load-OneFormer-fine-tuned-on-COCO-for-universal-segmentation)\n", "- [Convert the model to OpenVINO IR format](#Convert-the-model-to-OpenVINO-IR-format)\n", "- [Select inference device](#Select-inference-device)\n", "- [Choose a segmentation task](#Choose-a-segmentation-task)\n", "- [Inference](#Inference)\n", "- [Quantization](#Quantization)\n", " - [Preparing calibration dataset](#Preparing-calibration-dataset)\n", " - [Run quantization](#Run-quantization)\n", " - [Compare model size and performance](#Compare-model-size-and-performance)\n", "- [Interactive Demo](#Interactive-Demo)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3dc9e61e-e3d9-486b-a652-13083530fbc9", "metadata": { "tags": [] }, "source": [ "## Install required libraries\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "84e1b9f3-faf1-4260-aadf-c9edd53e53b6", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:49:55.246601Z", "start_time": "2023-10-06T09:49:54.106715300Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "import platform\n", "\n", "%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu \"transformers>=4.26.0\" \"openvino>=2023.1.0\" \"nncf>=2.7.0\" \"gradio>=4.19\" \"torch>=2.1\" scipy ipywidgets Pillow tqdm\n", "\n", "if platform.system() != \"Windows\":\n", " %pip install -q \"matplotlib>=3.4\"\n", "else:\n", " %pip install -q \"matplotlib>=3.4,<3.7\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c5fc745f-0960-4b17-8559-dd8daeac8318", "metadata": {}, "source": [ "## Prepare the environment\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Import all required packages and set paths for models and constant variables." ] }, { "cell_type": "code", "execution_count": 2, "id": "1099fec8-3e7b-4699-b949-a166547a1081", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:49:57.719482900Z", "start_time": "2023-10-06T09:49:54.246727800Z" } }, "outputs": [], "source": [ "import warnings\n", "from collections import defaultdict\n", "from pathlib import Path\n", "\n", "from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation\n", "from transformers.models.oneformer.modeling_oneformer import (\n", " OneFormerForUniversalSegmentationOutput,\n", ")\n", "import torch\n", "import matplotlib.pyplot as plt\n", "import matplotlib.patches as mpatches\n", "from PIL import Image\n", "from PIL import ImageOps\n", "\n", "import openvino\n", "\n", "# Fetch `notebook_utils` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", ")\n", "\n", "open(\"notebook_utils.py\", \"w\").write(r.text)\n", "from notebook_utils import download_file" ] }, { "cell_type": "code", "execution_count": 3, "id": "d3a38c4a-433f-4fed-bf5a-410be7160f78", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:49:57.739657400Z", "start_time": "2023-10-06T09:49:57.719482900Z" } }, "outputs": [], "source": [ "IR_PATH = Path(\"oneformer.xml\")\n", "OUTPUT_NAMES = [\"class_queries_logits\", \"masks_queries_logits\"]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "14aa6aa4-5730-469b-8fbe-2bece6ef1641", "metadata": { "tags": [] }, "source": [ "## Load OneFormer fine-tuned on COCO for universal segmentation\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Here we use the `from_pretrained` method of `OneFormerForUniversalSegmentation` to load the [HuggingFace OneFormer model](https://huggingface.co/docs/transformers/model_doc/oneformer) based on Swin-L backbone and trained on [COCO](https://cocodataset.org/) dataset.\n", "\n", "Also, we use HuggingFace processor to prepare the model inputs from images and post-process model outputs for visualization." ] }, { "cell_type": "code", "execution_count": 4, "id": "ca18e5a3-34dd-466b-b08f-5cf0d6069e2a", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:00.557805400Z", "start_time": "2023-10-06T09:49:57.720612300Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-10-06 14:00:53.306851: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2023-10-06 14:00:53.342792: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2023-10-06 14:00:53.913248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n", "/home/nsavel/venvs/ov_notebooks_tmp/lib/python3.8/site-packages/transformers/models/oneformer/image_processing_oneformer.py:427: FutureWarning: The `reduce_labels` argument is deprecated and will be removed in v4.27. Please use `do_reduce_labels` instead.\n", " warnings.warn(\n" ] } ], "source": [ "processor = OneFormerProcessor.from_pretrained(\"shi-labs/oneformer_coco_swin_large\")\n", "model = OneFormerForUniversalSegmentation.from_pretrained(\n", " \"shi-labs/oneformer_coco_swin_large\",\n", ")\n", "id2label = model.config.id2label" ] }, { "cell_type": "code", "execution_count": 5, "id": "95fb1caf-397a-4149-9dfe-641a47b3a68c", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:00.578488Z", "start_time": "2023-10-06T09:50:00.554796700Z" } }, "outputs": [], "source": [ "task_seq_length = processor.task_seq_length\n", "shape = (800, 800)\n", "dummy_input = {\n", " \"pixel_values\": torch.randn(1, 3, *shape),\n", " \"task_inputs\": torch.randn(1, task_seq_length),\n", "}" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9e4b09fd-dfd8-45bd-be93-992778ca8343", "metadata": { "tags": [] }, "source": [ "## Convert the model to OpenVINO IR format\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Convert the PyTorch model to IR format to take advantage of OpenVINO optimization tools and features. The `openvino.convert_model` python function in OpenVINO Converter can convert the model. The function returns instance of OpenVINO Model class, which is ready to use in Python interface. However, it can also be serialized to OpenVINO IR format for future execution using `save_model` function.\n", "PyTorch to OpenVINO conversion is based on TorchScript tracing. HuggingFace models have specific configuration parameter `torchscript`, which can be used for making the model more suitable for tracing. For preparing model. we should provide PyTorch model instance and example input to `openvino.convert_model`." ] }, { "cell_type": "code", "execution_count": 6, "id": "d16648a3-fa3c-46d5-b2dd-53646c5f9b82", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:00.578488Z", "start_time": "2023-10-06T09:50:00.565873Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s.\n" ] } ], "source": [ "model.config.torchscript = True\n", "\n", "if not IR_PATH.exists():\n", " with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " model = openvino.convert_model(model, example_input=dummy_input)\n", " openvino.save_model(model, IR_PATH, compress_to_fp16=False)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d55781cd", "metadata": {}, "source": [ "## Select inference device\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 7, "id": "34283a4d", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:00.697540200Z", "start_time": "2023-10-06T09:50:00.570554700Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2b79d1df8d424db7896fe5be817bdf56", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets as widgets\n", "\n", "core = openvino.Core()\n", "\n", "device = widgets.Dropdown(\n", " options=core.available_devices + [\"AUTO\"],\n", " value=\"AUTO\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f2472a04-a13b-465c-9f42-9f961d7d7907", "metadata": { "tags": [] }, "source": [ "We can prepare the image using the HuggingFace processor. OneFormer leverages a processor which internally consists of an image processor (for the image modality) and a tokenizer (for the text modality). OneFormer is actually a multimodal model, since it incorporates both images and text to solve image segmentation." ] }, { "cell_type": "code", "execution_count": 8, "id": "b3beb94c-03b5-4f91-b44b-350ace719c64", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:00.760038500Z", "start_time": "2023-10-06T09:50:00.686303500Z" } }, "outputs": [], "source": [ "def prepare_inputs(image: Image.Image, task: str):\n", " \"\"\"Convert image to model input\"\"\"\n", " image = ImageOps.pad(image, shape)\n", " inputs = processor(image, [task], return_tensors=\"pt\")\n", " converted = {\n", " \"pixel_values\": inputs[\"pixel_values\"],\n", " \"task_inputs\": inputs[\"task_inputs\"],\n", " }\n", " return converted" ] }, { "cell_type": "code", "execution_count": 9, "id": "63c52ac9-021d-4cbe-8a73-61c1f555220e", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:00.760038500Z", "start_time": "2023-10-06T09:50:00.734454800Z" } }, "outputs": [], "source": [ "def process_output(d):\n", " \"\"\"Convert OpenVINO model output to HuggingFace representation for visualization\"\"\"\n", " hf_kwargs = {output_name: torch.tensor(d[output_name]) for output_name in OUTPUT_NAMES}\n", "\n", " return OneFormerForUniversalSegmentationOutput(**hf_kwargs)" ] }, { "cell_type": "code", "execution_count": 10, "id": "a7248344-3579-4016-a07c-027251c749c3", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:36.298383400Z", "start_time": "2023-10-06T09:50:00.734454800Z" } }, "outputs": [], "source": [ "# Read the model from files.\n", "model = core.read_model(model=IR_PATH)\n", "# Compile the model.\n", "compiled_model = core.compile_model(model=model, device_name=device.value)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1661c9b8-e08c-48a9-ab22-4d6fddab572a", "metadata": {}, "source": [ "Model predicts `class_queries_logits` of shape `(batch_size, num_queries)`\n", "and `masks_queries_logits` of shape `(batch_size, num_queries, height, width)`." ] }, { "attachments": {}, "cell_type": "markdown", "id": "58b3ab54-4f57-40ee-b32e-b2fff72b77af", "metadata": {}, "source": [ "Here we define functions for visualization of network outputs to show the inference results." ] }, { "cell_type": "code", "execution_count": 11, "id": "1dc3ec39-83ed-4fd8-8ff8-8975b027d18a", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:36.301895Z", "start_time": "2023-10-06T09:50:36.298887600Z" } }, "outputs": [], "source": [ "class Visualizer:\n", " @staticmethod\n", " def extract_legend(handles):\n", " fig = plt.figure()\n", " fig.legend(handles=handles, ncol=len(handles) // 20 + 1, loc=\"center\")\n", " fig.tight_layout()\n", " return fig\n", "\n", " @staticmethod\n", " def predicted_semantic_map_to_figure(predicted_map):\n", " segmentation = predicted_map[0]\n", " # get the used color map\n", " viridis = plt.get_cmap(\"viridis\", max(1, torch.max(segmentation)))\n", " # get all the unique numbers\n", " labels_ids = torch.unique(segmentation).tolist()\n", " fig, ax = plt.subplots()\n", " ax.imshow(segmentation)\n", " ax.set_axis_off()\n", " handles = []\n", " for label_id in labels_ids:\n", " label = id2label[label_id]\n", " color = viridis(label_id)\n", " handles.append(mpatches.Patch(color=color, label=label))\n", " fig_legend = Visualizer.extract_legend(handles=handles)\n", " fig.tight_layout()\n", " return fig, fig_legend\n", "\n", " @staticmethod\n", " def predicted_instance_map_to_figure(predicted_map):\n", " segmentation = predicted_map[0][\"segmentation\"]\n", " segments_info = predicted_map[0][\"segments_info\"]\n", " # get the used color map\n", " viridis = plt.get_cmap(\"viridis\", max(torch.max(segmentation), 1))\n", " fig, ax = plt.subplots()\n", " ax.imshow(segmentation)\n", " ax.set_axis_off()\n", " instances_counter = defaultdict(int)\n", " handles = []\n", " # for each segment, draw its legend\n", " for segment in segments_info:\n", " segment_id = segment[\"id\"]\n", " segment_label_id = segment[\"label_id\"]\n", " segment_label = id2label[segment_label_id]\n", " label = f\"{segment_label}-{instances_counter[segment_label_id]}\"\n", " instances_counter[segment_label_id] += 1\n", " color = viridis(segment_id)\n", " handles.append(mpatches.Patch(color=color, label=label))\n", "\n", " fig_legend = Visualizer.extract_legend(handles)\n", " fig.tight_layout()\n", " return fig, fig_legend\n", "\n", " @staticmethod\n", " def predicted_panoptic_map_to_figure(predicted_map):\n", " segmentation = predicted_map[0][\"segmentation\"]\n", " segments_info = predicted_map[0][\"segments_info\"]\n", " # get the used color map\n", " viridis = plt.get_cmap(\"viridis\", max(torch.max(segmentation), 1))\n", " fig, ax = plt.subplots()\n", " ax.imshow(segmentation)\n", " ax.set_axis_off()\n", " instances_counter = defaultdict(int)\n", " handles = []\n", " # for each segment, draw its legend\n", " for segment in segments_info:\n", " segment_id = segment[\"id\"]\n", " segment_label_id = segment[\"label_id\"]\n", " segment_label = id2label[segment_label_id]\n", " label = f\"{segment_label}-{instances_counter[segment_label_id]}\"\n", " instances_counter[segment_label_id] += 1\n", " color = viridis(segment_id)\n", " handles.append(mpatches.Patch(color=color, label=label))\n", "\n", " fig_legend = Visualizer.extract_legend(handles)\n", " fig.tight_layout()\n", " return fig, fig_legend\n", "\n", " @staticmethod\n", " def figures_to_images(fig, fig_legend, name_suffix=\"\"):\n", " seg_filename, leg_filename = (\n", " f\"segmentation{name_suffix}.png\",\n", " f\"legend{name_suffix}.png\",\n", " )\n", " fig.savefig(seg_filename, bbox_inches=\"tight\")\n", " fig_legend.savefig(leg_filename, bbox_inches=\"tight\")\n", " segmentation = Image.open(seg_filename)\n", " legend = Image.open(leg_filename)\n", " return segmentation, legend" ] }, { "cell_type": "code", "execution_count": 12, "id": "702813df-3229-44c1-afbe-58cef6dff28d", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:36.301895Z", "start_time": "2023-10-06T09:50:36.298887600Z" } }, "outputs": [], "source": [ "def segment(model, img: Image.Image, task: str):\n", " \"\"\"\n", " Apply segmentation on an image.\n", "\n", " Args:\n", " img: Input image. It will be resized to 800x800.\n", " task: String describing the segmentation task. Supported values are: \"semantic\", \"instance\" and \"panoptic\".\n", " Returns:\n", " Tuple[Figure, Figure]: Segmentation map and legend charts.\n", " \"\"\"\n", " if img is None:\n", " raise gr.Error(\"Please load the image or use one from the examples list\")\n", " inputs = prepare_inputs(img, task)\n", " outputs = model(inputs)\n", " hf_output = process_output(outputs)\n", " predicted_map = getattr(processor, f\"post_process_{task}_segmentation\")(hf_output, target_sizes=[img.size[::-1]])\n", " return getattr(Visualizer, f\"predicted_{task}_map_to_figure\")(predicted_map)" ] }, { "cell_type": "code", "execution_count": 13, "id": "ed98c5da-d67a-4c94-8df1-c760098b827a", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:36.898279100Z", "start_time": "2023-10-06T09:50:36.298887600Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "223ed197a12f45d290b3ee6cfe2465a0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "sample.jpg: 0%| | 0.00/194k [00:00" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image = download_file(\"http://images.cocodataset.org/val2017/000000439180.jpg\", \"sample.jpg\")\n", "image = Image.open(\"sample.jpg\")\n", "image" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a4bb371e-3573-407b-80f8-5f49a92eecf7", "metadata": {}, "source": [ "## Choose a segmentation task\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "0685b34c-2ec7-44d7-9238-12783107bdec", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:36.899298100Z", "start_time": "2023-10-06T09:50:36.878775400Z" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b1462b3ffc2840e4a649bb32ebd4117c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(options=('semantic', 'instance', 'panoptic'), value='semantic')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from ipywidgets import Dropdown\n", "\n", "task = Dropdown(options=[\"semantic\", \"instance\", \"panoptic\"], value=\"semantic\")\n", "task" ] }, { "attachments": {}, "cell_type": "markdown", "id": "53dff9a4-a0ba-4455-a01f-bc7d65662e32", "metadata": {}, "source": [ "## Inference\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "a9a6df8c-8f10-4502-a888-8c0c56bcd667", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:39.040662700Z", "start_time": "2023-10-06T09:50:36.879762200Z" } }, "outputs": [ { "data": { "image/jpeg": "", "image/png": "", "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import matplotlib\n", "\n", "matplotlib.use(\"Agg\") # disable showing figures\n", "\n", "\n", "def stack_images_horizontally(img1: Image, img2: Image):\n", " res = Image.new(\"RGB\", (img1.width + img2.width, max(img1.height, img2.height)), (255, 255, 255))\n", " res.paste(img1, (0, 0))\n", " res.paste(img2, (img1.width, 0))\n", " return res\n", "\n", "\n", "segmentation_fig, legend_fig = segment(compiled_model, image, task.value)\n", "segmentation_image, legend_image = Visualizer.figures_to_images(segmentation_fig, legend_fig)\n", "plt.close(\"all\")\n", "prediction = stack_images_horizontally(segmentation_image, legend_image)\n", "prediction" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c8cac1f9", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "## Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fb768314-e377-424a-98fa-1ab5f2dc55f9", "metadata": {}, "source": [ "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n", "\n", "The optimization process contains the following steps:\n", "1. Create a calibration dataset for quantization.\n", "2. Run `nncf.quantize()` to obtain quantized model.\n", "3. Serialize the `INT8` model using `openvino.save_model()` function.\n", "\n", "> Note: Quantization is time and memory consuming operation. Running quantization code below may take some time." ] }, { "attachments": {}, "cell_type": "markdown", "id": "fc4937a0-1b7d-4631-adee-3dff9c23b4fb", "metadata": {}, "source": [ "Please select below whether you would like to run quantization to improve model inference speed." ] }, { "cell_type": "code", "execution_count": 16, "id": "b275b069", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:39.040662700Z", "start_time": "2023-10-06T09:50:39.040662700Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false }, "test_replace": { "value=False": "value=True" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3ca7c0e009414b26ac95211abe84a5b2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Quantization')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compiled_quantized_model = None\n", "\n", "to_quantize = widgets.Checkbox(\n", " value=False,\n", " description=\"Quantization\",\n", " disabled=False,\n", ")\n", "\n", "to_quantize" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9b4afdfd-3061-4456-9882-1fbd09f6f067", "metadata": {}, "source": [ "Let's load skip magic extension to skip quantization if to_quantize is not selected" ] }, { "cell_type": "code", "execution_count": 17, "id": "77fcac65", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:39.040662700Z", "start_time": "2023-10-06T09:50:39.040662700Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Fetch `skip_kernel_extension` module\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py\",\n", ")\n", "open(\"skip_kernel_extension.py\", \"w\").write(r.text)\n", "\n", "%load_ext skip_kernel_extension" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9f9209c2-412d-4edc-b860-a69cfcf2af4f", "metadata": {}, "source": [ "### Preparing calibration dataset\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "We use images from [COCO128](https://www.kaggle.com/datasets/ultralytics/coco128) dataset as calibration samples." ] }, { "cell_type": "code", "execution_count": 18, "id": "7f9e1897", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:50:40.699943100Z", "start_time": "2023-10-06T09:50:39.040662700Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ebdf72abea024a8b9b882dc74235b651", "version_major": 2, "version_minor": 0 }, "text/plain": [ "coco128.zip: 0%| | 0.00/6.66M [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Quantized model prediction:\n" ] }, { "data": { "image/jpeg": "", "image/png": "", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "from IPython.display import display\n", "\n", "image = Image.open(\"sample.jpg\")\n", "segmentation_fig, legend_fig = segment(compiled_quantized_model, image, task.value)\n", "segmentation_image, legend_image = Visualizer.figures_to_images(segmentation_fig, legend_fig, name_suffix=\"_int8\")\n", "plt.close(\"all\")\n", "prediction_int8 = stack_images_horizontally(segmentation_image, legend_image)\n", "print(\"Original model prediction:\")\n", "display(prediction)\n", "print(\"Quantized model prediction:\")\n", "display(prediction_int8)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ce70995f", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "### Compare model size and performance\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Below we compare original and quantized model footprint and inference speed." ] }, { "cell_type": "code", "execution_count": 21, "id": "9b851a5d-d437-4f22-a6a2-8049b5c91f72", "metadata": { "ExecuteTime": { "end_time": "2023-10-06T09:52:30.197035100Z", "start_time": "2023-10-06T09:51:24.972294800Z" }, "test_replace": { "INFERENCE_TIME_DATASET_SIZE = 30": "INFERENCE_TIME_DATASET_SIZE = 1" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "505c39db58984de19e40d38248476693", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Measuring performance: 0%| | 0/30 [00:00" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import time\n", "import gradio as gr\n", "\n", "quantized_model_present = compiled_quantized_model is not None\n", "\n", "\n", "def compile_model(device):\n", " global compiled_model\n", " global compiled_quantized_model\n", " compiled_model = core.compile_model(model=model, device_name=device)\n", " if quantized_model_present:\n", " compiled_quantized_model = core.compile_model(model=quantized_model, device_name=device)\n", "\n", "\n", "def segment_wrapper(image, task, run_quantized=False):\n", " current_model = compiled_quantized_model if run_quantized else compiled_model\n", "\n", " start_time = time.perf_counter()\n", " segmentation_fig, legend_fig = segment(current_model, image, task)\n", " end_time = time.perf_counter()\n", "\n", " name_suffix = \"\" if not quantized_model_present else \"_int8\" if run_quantized else \"_fp32\"\n", " segmentation_image, legend_image = Visualizer.figures_to_images(segmentation_fig, legend_fig, name_suffix=name_suffix)\n", " plt.close(\"all\")\n", " result = stack_images_horizontally(segmentation_image, legend_image)\n", " return result, f\"{end_time - start_time:.2f}\"\n", "\n", "\n", "with gr.Blocks() as demo:\n", " with gr.Row():\n", " with gr.Column():\n", " inp_img = gr.Image(label=\"Image\", type=\"pil\")\n", " inp_task = gr.Radio([\"semantic\", \"instance\", \"panoptic\"], label=\"Task\", value=\"semantic\")\n", " inp_device = gr.Dropdown(label=\"Device\", choices=core.available_devices + [\"AUTO\"], value=\"AUTO\")\n", " with gr.Column():\n", " out_result = gr.Image(label=\"Result (Original)\" if quantized_model_present else \"Result\")\n", " inference_time = gr.Textbox(label=\"Time (seconds)\")\n", " out_result_quantized = gr.Image(label=\"Result (Quantized)\", visible=quantized_model_present)\n", " inference_time_quantized = gr.Textbox(label=\"Time (seconds)\", visible=quantized_model_present)\n", " run_button = gr.Button(value=\"Run\")\n", " run_button.click(\n", " segment_wrapper,\n", " [inp_img, inp_task, gr.Number(0, visible=False)],\n", " [out_result, inference_time],\n", " )\n", " run_quantized_button = gr.Button(value=\"Run quantized\", visible=quantized_model_present)\n", " run_quantized_button.click(\n", " segment_wrapper,\n", " [inp_img, inp_task, gr.Number(1, visible=False)],\n", " [out_result_quantized, inference_time_quantized],\n", " )\n", " gr.Examples(examples=[[\"sample.jpg\", \"semantic\"]], inputs=[inp_img, inp_task])\n", "\n", " def on_device_change_begin():\n", " return (\n", " run_button.update(value=\"Changing device...\", interactive=False),\n", " run_quantized_button.update(value=\"Changing device...\", interactive=False),\n", " inp_device.update(interactive=False),\n", " )\n", "\n", " def on_device_change_end():\n", " return (\n", " run_button.update(value=\"Run\", interactive=True),\n", " run_quantized_button.update(value=\"Run quantized\", interactive=True),\n", " inp_device.update(interactive=True),\n", " )\n", "\n", " inp_device.change(on_device_change_begin, outputs=[run_button, run_quantized_button, inp_device]).then(compile_model, inp_device).then(\n", " on_device_change_end, outputs=[run_button, run_quantized_button, inp_device]\n", " )\n", "\n", "try:\n", " demo.launch(debug=True)\n", "except Exception:\n", " demo.launch(share=True, debug=True)\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/oneformer-segmentation/oneformer-segmentation.png?raw=true", "tags": { "categories": [ "Model Demos" ], "libraries": [], "other": [], "tasks": [ "Image Segmentation" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }