{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "git68adWeq4l" }, "source": [ "# Quantization Aware Training with NNCF, using PyTorch framework\n", "\n", "This notebook is based on [ImageNet training in PyTorch](https://github.com/pytorch/examples/blob/master/imagenet/main.py).\n", "\n", "The goal of this notebook is to demonstrate how to use the Neural Network Compression Framework [NNCF](https://github.com/openvinotoolkit/nncf) 8-bit quantization to optimize a PyTorch model for inference with OpenVINO Toolkit. The optimization process contains the following steps:\n", "\n", "* Transforming the original `FP32` model to `INT8`\n", "* Using fine-tuning to improve the accuracy.\n", "* Exporting optimized and original models to OpenVINO IR\n", "* Measuring and comparing the performance of models.\n", "\n", "For more advanced usage, refer to these [examples](https://github.com/openvinotoolkit/nncf/tree/develop/examples).\n", "\n", "This tutorial uses the ResNet-18 model with the Tiny ImageNet-200 dataset. ResNet-18 is the version of ResNet models that contains the fewest layers (18). Tiny ImageNet-200 is a subset of the larger ImageNet dataset with smaller images. The dataset will be downloaded in the notebook. Using the smaller model and dataset will speed up training and download time. To see other ResNet models, visit [PyTorch hub](https://pytorch.org/hub/pytorch_vision_resnet/).\n", "\n", "> **NOTE**: This notebook requires a C++ compiler for compiling PyTorch custom operations for quantization.\n", "> For Windows we recommend to install Visual Studio with C++ support, you can find instruction [here](https://learn.microsoft.com/en-us/cpp/build/vscpp-step-0-installation?view=msvc-170).\n", "> For MacOS `xcode-select --install` command installs many developer tools, including C++.\n", "> For Linux you can install gcc with your distribution's package manager.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Table of contents:\n", "\n", "- [Imports and Settings](#Imports-and-Settings)\n", "- [Pre-train Floating-Point Model](#Pre-train-Floating-Point-Model)\n", " - [Train Function](#Train-Function)\n", " - [Validate Function](#Validate-Function)\n", " - [Helpers](#Helpers)\n", " - [Get a Pre-trained FP32 Model](#Get-a-Pre-trained-FP32-Model)\n", "- [Create and Initialize Quantization](#Create-and-Initialize-Quantization)\n", "- [Fine-tune the Compressed Model](#Fine-tune-the-Compressed-Model)\n", "- [Export INT8 Model to OpenVINO IR](#Export-INT8-Model-to-OpenVINO-IR)\n", "- [Benchmark Model Performance by Computing Inference Time](#Benchmark-Model-Performance-by-Computing-Inference-Time)\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu \"openvino>=2024.0.0\" \"torch\" \"torchvision\" \"tqdm\"\n", "%pip install -q \"nncf>=2.9.0\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "6M1xndNu-z_2" }, "source": [ "## Imports and Settings\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "On Windows, add the required C++ directories to the system PATH.\n", "\n", "Import NNCF and all auxiliary packages from your Python code.\n", "Set a name for the model, and the image width and height that will be used for the network. Also define paths where PyTorch and OpenVINO IR versions of the models will be stored. \n", "\n", "> **NOTE**: All NNCF logging messages below ERROR level (INFO and WARNING) are disabled to simplify the tutorial. For production use, it is recommended to enable logging by removing ```set_log_level(logging.ERROR)```." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "BtaM_i2mEB0z", "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using cuda device\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "aaac9ef8ffe44d138d226367829c94c4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "model/resnet18_fp32.pth: 0%| | 0.00/43.1M [00:00 best_acc1\n", " best_acc1 = max(acc1, best_acc1)\n", "\n", " if is_best:\n", " checkpoint = {\"state_dict\": model.state_dict(), \"acc1\": acc1}\n", " torch.save(checkpoint, fp32_pth_path)\n", " acc1_fp32 = best_acc1\n", "\n", "print(f\"Accuracy of FP32 model: {acc1_fp32:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "pt_xNDDrJKsy", "outputId": "0925c801-0585-4431-98c9-de0decc4ad27", "pycharm": { "name": "#%% md\n" } }, "source": [ "Export the `FP32` model to OpenVINO™ Intermediate Representation, to benchmark it in comparison with the `INT8` model." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "9d8LOmKut36x", "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FP32 model was exported to model/resnet18_fp32.xml.\n" ] } ], "source": [ "dummy_input = torch.randn(1, 3, image_size, image_size).to(device)\n", "\n", "ov_model = ov.convert_model(model, example_input=dummy_input, input=[1, 3, image_size, image_size])\n", "ov.save_model(ov_model, fp32_ir_path, compress_to_fp16=False)\n", "print(f\"FP32 model was exported to {fp32_ir_path}.\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "pobVoHEoKcYp" }, "source": [ "## Create and Initialize Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "NNCF enables compression-aware training by integrating into regular training pipelines. The framework is designed so that modifications to your original training code are minor.\n", "Quantization requires only 2 modifications." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ENAbqFpdWSlE", "outputId": "cd2701e3-e4a2-4a19-86cd-ae37f45cd64a", "pycharm": { "name": "#%% md\n" } }, "source": [ "1. Create a quantization data loader with batch size equal to one and wrap it by the `nncf.Dataset`, specifying a transformation function which prepares input data to fit into model during quantization. In our case, to pick input tensor from pair (input tensor and label)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "_I_G-g9TPWkl", "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino\n" ] } ], "source": [ "import nncf\n", "\n", "\n", "def transform_fn(data_item):\n", " return data_item[0]\n", "\n", "\n", "# Creating separate dataloader with batch size = 1\n", "# as dataloaders with batches > 1 is not supported yet.\n", "quantization_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False, num_workers=0, pin_memory=True)\n", "\n", "quantization_dataset = nncf.Dataset(quantization_loader, transform_fn)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "2. Run `nncf.quantize` for Getting an Optimized Model.\n", "\n", "`nncf.quantize` function accepts model and prepared quantization dataset for performing basic quantization. Optionally, additional parameters like `subset_size`, `preset`, `ignored_scope` can be provided to improve quantization result if applicable. More details about supported parameters can be found on this [page](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/quantizing-models-post-training/basic-quantization-flow.html#tune-quantization-parameters)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-01-17 15:43:43.543878: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-01-17 15:43:43.579576: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2024-01-17 15:43:44.170538: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:nncf:NNCF provides best results with torch==2.1.0, while current torch version is 1.13.0+cu117. If you encounter issues, consider switching to torch==2.1.0\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "efcf6cd485e745bb920296275e772aab", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:Compiling and loading torch extension: quantized_functions_cuda...\n", "INFO:nncf:Finished loading torch extension: quantized_functions_cuda\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0f4d0f4cbf774827aaa7123ebd6b9bbb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantized_model = nncf.quantize(model, quantization_dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate the new model on the validation set after initialization of quantization. The accuracy should be close to the accuracy of the floating-point `FP32` model for a simple case like the one being demonstrated here." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test: [ 0/79]\tTime 0.110 (0.110)\tLoss 0.992 (0.992)\tAcc@1 78.12 (78.12)\tAcc@5 89.06 (89.06)\n", "Test: [10/79]\tTime 0.069 (0.074)\tLoss 1.990 (1.623)\tAcc@1 44.53 (60.37)\tAcc@5 79.69 (83.95)\n", "Test: [20/79]\tTime 0.068 (0.072)\tLoss 1.814 (1.704)\tAcc@1 60.16 (58.26)\tAcc@5 80.47 (82.63)\n", "Test: [30/79]\tTime 0.068 (0.071)\tLoss 2.284 (1.794)\tAcc@1 52.34 (56.75)\tAcc@5 67.97 (80.90)\n", "Test: [40/79]\tTime 0.070 (0.072)\tLoss 1.618 (1.831)\tAcc@1 61.72 (55.64)\tAcc@5 82.03 (80.37)\n", "Test: [50/79]\tTime 0.069 (0.071)\tLoss 1.951 (1.832)\tAcc@1 57.81 (55.70)\tAcc@5 75.00 (80.06)\n", "Test: [60/79]\tTime 0.070 (0.071)\tLoss 1.795 (1.855)\tAcc@1 56.25 (55.28)\tAcc@5 84.38 (79.75)\n", "Test: [70/79]\tTime 0.069 (0.071)\tLoss 2.359 (1.888)\tAcc@1 47.66 (54.79)\tAcc@5 74.22 (79.08)\n", " * Acc@1 55.130 Acc@5 79.680\n", "Accuracy of initialized INT8 model: 55.130\n" ] } ], "source": [ "acc1 = validate(val_loader, quantized_model, criterion)\n", "print(f\"Accuracy of initialized INT8 model: {acc1:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Fine-tune the Compressed Model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "At this step, a regular fine-tuning process is applied to further improve quantized model accuracy. Normally, several epochs of tuning are required with a small learning rate, the same that is usually used at the end of the training of the original model. No other changes in the training pipeline are required. Here is a simple example." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch:[0][ 0/782]\tTime 0.284 (0.284)\tLoss 0.876 (0.876)\tAcc@1 78.12 (78.12)\tAcc@5 92.97 (92.97)\n", "Epoch:[0][ 50/782]\tTime 0.112 (0.116)\tLoss 0.796 (0.808)\tAcc@1 80.47 (79.96)\tAcc@5 94.53 (94.27)\n", "Epoch:[0][100/782]\tTime 0.111 (0.114)\tLoss 0.785 (0.788)\tAcc@1 82.81 (80.52)\tAcc@5 92.19 (94.56)\n", "Epoch:[0][150/782]\tTime 0.114 (0.113)\tLoss 0.653 (0.785)\tAcc@1 84.38 (80.69)\tAcc@5 95.31 (94.45)\n", "Epoch:[0][200/782]\tTime 0.109 (0.113)\tLoss 0.804 (0.780)\tAcc@1 80.47 (80.92)\tAcc@5 94.53 (94.45)\n", "Epoch:[0][250/782]\tTime 0.111 (0.113)\tLoss 0.756 (0.777)\tAcc@1 83.59 (80.98)\tAcc@5 94.53 (94.47)\n", "Epoch:[0][300/782]\tTime 0.112 (0.112)\tLoss 0.665 (0.772)\tAcc@1 82.03 (81.07)\tAcc@5 96.88 (94.53)\n", "Epoch:[0][350/782]\tTime 0.115 (0.112)\tLoss 0.661 (0.767)\tAcc@1 82.81 (81.14)\tAcc@5 97.66 (94.57)\n", "Epoch:[0][400/782]\tTime 0.111 (0.113)\tLoss 0.661 (0.764)\tAcc@1 78.91 (81.24)\tAcc@5 96.09 (94.60)\n", "Epoch:[0][450/782]\tTime 0.119 (0.113)\tLoss 0.904 (0.762)\tAcc@1 79.69 (81.27)\tAcc@5 89.06 (94.60)\n", "Epoch:[0][500/782]\tTime 0.113 (0.113)\tLoss 0.609 (0.757)\tAcc@1 84.38 (81.46)\tAcc@5 96.88 (94.62)\n", "Epoch:[0][550/782]\tTime 0.112 (0.113)\tLoss 0.833 (0.753)\tAcc@1 76.56 (81.59)\tAcc@5 95.31 (94.69)\n", "Epoch:[0][600/782]\tTime 0.112 (0.113)\tLoss 0.768 (0.751)\tAcc@1 82.81 (81.63)\tAcc@5 95.31 (94.69)\n", "Epoch:[0][650/782]\tTime 0.112 (0.113)\tLoss 0.750 (0.751)\tAcc@1 82.81 (81.61)\tAcc@5 93.75 (94.71)\n", "Epoch:[0][700/782]\tTime 0.110 (0.113)\tLoss 0.654 (0.749)\tAcc@1 84.38 (81.62)\tAcc@5 96.09 (94.71)\n", "Epoch:[0][750/782]\tTime 0.110 (0.113)\tLoss 0.575 (0.748)\tAcc@1 86.72 (81.67)\tAcc@5 97.66 (94.73)\n", "Test: [ 0/79]\tTime 0.070 (0.070)\tLoss 1.028 (1.028)\tAcc@1 78.91 (78.91)\tAcc@5 86.72 (86.72)\n", "Test: [10/79]\tTime 0.070 (0.070)\tLoss 1.827 (1.514)\tAcc@1 46.88 (63.35)\tAcc@5 79.69 (84.02)\n", "Test: [20/79]\tTime 0.073 (0.070)\tLoss 1.628 (1.594)\tAcc@1 64.06 (60.97)\tAcc@5 82.03 (83.78)\n", "Test: [30/79]\tTime 0.069 (0.070)\tLoss 2.061 (1.688)\tAcc@1 57.03 (59.25)\tAcc@5 71.88 (82.26)\n", "Test: [40/79]\tTime 0.070 (0.070)\tLoss 1.495 (1.738)\tAcc@1 66.41 (57.93)\tAcc@5 85.16 (81.59)\n", "Test: [50/79]\tTime 0.069 (0.070)\tLoss 1.863 (1.741)\tAcc@1 58.59 (57.83)\tAcc@5 76.56 (81.31)\n", "Test: [60/79]\tTime 0.069 (0.070)\tLoss 1.571 (1.779)\tAcc@1 65.62 (57.21)\tAcc@5 84.38 (80.74)\n", "Test: [70/79]\tTime 0.069 (0.070)\tLoss 2.505 (1.809)\tAcc@1 46.09 (56.78)\tAcc@5 75.00 (80.22)\n", " * Acc@1 57.200 Acc@5 80.880\n", "Accuracy of tuned INT8 model: 57.200\n", "Accuracy drop of tuned INT8 model over pre-trained FP32 model: -1.680\n" ] } ], "source": [ "compression_lr = init_lr / 10\n", "optimizer = torch.optim.Adam(quantized_model.parameters(), lr=compression_lr)\n", "\n", "# Train for one epoch with NNCF.\n", "train(train_loader, quantized_model, criterion, optimizer, epoch=0)\n", "\n", "# Evaluate on validation set after Quantization-Aware Training (QAT case).\n", "acc1_int8 = validate(val_loader, quantized_model, criterion)\n", "\n", "print(f\"Accuracy of tuned INT8 model: {acc1_int8:.3f}\")\n", "print(f\"Accuracy drop of tuned INT8 model over pre-trained FP32 model: {acc1_fp32 - acc1_int8:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Export INT8 Model to OpenVINO IR\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11.\n", "INT8 Omodel exported to model/resnet18_int8.xml.\n" ] } ], "source": [ "if not int8_ir_path.exists():\n", " warnings.filterwarnings(\"ignore\", category=TracerWarning)\n", " warnings.filterwarnings(\"ignore\", category=UserWarning)\n", " # Export INT8 model to OpenVINO™ IR\n", " ov_model = ov.convert_model(quantized_model, example_input=dummy_input, input=[1, 3, image_size, image_size])\n", " ov.save_model(ov_model, int8_ir_path)\n", " print(f\"INT8 model exported to {int8_ir_path}.\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Benchmark Model Performance by Computing Inference Time\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Finally, measure the inference performance of the `FP32` and `INT8` models, using [Benchmark Tool](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/benchmark-tool.html) - inference performance measurement tool in OpenVINO. By default, Benchmark Tool runs inference for 60 seconds in asynchronous mode on CPU. It returns inference speed as latency (milliseconds per image) and throughput (frames per second) values.\n", "\n", "> **NOTE**: This notebook runs `benchmark_app` for 15 seconds to give a quick indication of performance. For more accurate performance, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications. Run `benchmark_app -m model.xml -d CPU` to benchmark async inference on CPU for one minute. Change CPU to GPU to benchmark on GPU. Run `benchmark_app --help` to see an overview of all command-line options." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ipywidgets as widgets\n", "\n", "# Initialize OpenVINO runtime\n", "core = ov.Core()\n", "device = widgets.Dropdown(\n", " options=core.available_devices,\n", " value=\"CPU\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Benchmark FP32 model (IR)\n", "[ INFO ] Throughput: 3755.92 FPS\n", "Benchmark INT8 model (IR)\n", "[ INFO ] Throughput: 15141.53 FPS\n" ] } ], "source": [ "def parse_benchmark_output(benchmark_output):\n", " parsed_output = [line for line in benchmark_output if \"FPS\" in line]\n", " print(*parsed_output, sep=\"\\n\")\n", "\n", "\n", "print(\"Benchmark FP32 model (IR)\")\n", "benchmark_output = ! benchmark_app -m $fp32_ir_path -d $device.value -api async -t 15\n", "parse_benchmark_output(benchmark_output)\n", "\n", "print(\"Benchmark INT8 model (IR)\")\n", "benchmark_output = ! benchmark_app -m $int8_ir_path -d $device.value -api async -t 15\n", "parse_benchmark_output(benchmark_output)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Show Device Information for reference." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "'Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "core.get_property(device.value, \"FULL_DEVICE_NAME\")" ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [ "K5HPrY_d-7cV", "E01dMaR2_AFL", "qMnYsGo9_MA8", "L0tH9KdwtHhV" ], "name": "NNCF Quantization PyTorch Demo (tiny-imagenet/resnet-18)", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "", "tags": { "categories": [ "Model Training", "Optimize" ], "libraries": [], "other": [], "tasks": [ "Image Classification" ] } }, "vscode": { "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }