{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Named entity recognition with OpenVINO™\n",
    "\n",
    "\n",
    "The Named Entity Recognition(NER) is a natural language processing method that involves the detecting of key information in the unstructured text and categorizing it into pre-defined categories. These categories or named entities refer to the key subjects of text, such as names, locations, companies and etc.\n",
    "\n",
    "NER is a good method for the situations when a high-level overview of a large amount of text is needed. NER can be helpful with such task as analyzing key information in unstructured text or automates the information extraction of large amounts of data.\n",
    "\n",
    "\n",
    "This tutorial shows how to perform named entity recognition using OpenVINO. We will use the pre-trained model [`elastic/distilbert-base-cased-finetuned-conll03-english`](https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english). It is DistilBERT based model, trained on [`conll03 english dataset`](https://huggingface.co/datasets/conll2003). The model can recognize four named entities in text: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The model is sensitive to capital letters.\n",
    "\n",
    "To simplify the user experience, the [Hugging Face Optimum](https://huggingface.co/docs/optimum) library is used to convert the model to OpenVINO™ IR format and quantize it.\n",
    "\n",
    "\n",
    "#### Table of contents:\n",
    "\n",
    "- [Prerequisites](#Prerequisites)\n",
    "- [Download the NER model](#Download-the-NER-model)\n",
    "- [Quantize the model, using Hugging Face Optimum API](#Quantize-the-model,-using-Hugging-Face-Optimum-API)\n",
    "- [Compare the Original and Quantized Models](#Compare-the-Original-and-Quantized-Models)\n",
    "    - [Compare performance](#Compare-performance)\n",
    "    - [Compare size of the models](#Compare-size-of-the-models)\n",
    "- [Prepare demo for Named Entity Recognition OpenVINO Runtime](#Prepare-demo-for-Named-Entity-Recognition-OpenVINO-Runtime)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "[back to top ⬆️](#Table-of-contents:)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "%pip install -q \"diffusers>=0.17.1\" \"openvino>=2023.1.0\" \"nncf>=2.5.0\" \"gradio>=4.19\" \"onnx>=1.11.0\" \"transformers>=4.33.0\" \"torch>=2.1\" --extra-index-url https://download.pytorch.org/whl/cpu\n",
    "%pip install -q \"git+https://github.com/huggingface/optimum-intel.git\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the NER model\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "We load the [`distilbert-base-cased-finetuned-conll03-english`](https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english) model from the [Hugging Face Hub](https://huggingface.co/models) with [Hugging Face Transformers library](https://huggingface.co/docs/transformers/index)and Optimum Intel with OpenVINO integration.\n",
    "\n",
    "`OVModelForTokenClassification` is represent model class for Named Entity Recognition task in Optimum Intel. Model class initialization starts with calling `from_pretrained` method. For conversion original PyTorch model to OpenVINO format on the fly, `export=True` parameter should be used. To easily save the model, you can use the `save_pretrained()` method. After saving the model on disk, we can use pre-converted model for next usage, and speedup deployment process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'\n",
      "2024-04-05 18:35:04.594311: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "2024-04-05 18:35:04.596755: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2024-04-05 18:35:04.628293: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-04-05 18:35:04.628326: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-04-05 18:35:04.628349: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2024-04-05 18:35:04.634704: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2024-04-05 18:35:04.635314: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2024-04-05 18:35:05.607762: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
      "/home/ea/miniconda3/lib/python3.11/site-packages/transformers/utils/import_utils.py:519: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.\n",
      "  warnings.warn(\n",
      "Framework not specified. Using pt to export the model.\n",
      "Using the export variant default. Available variants are:\n",
      "    - default: The default ONNX variant.\n",
      "Using framework PyTorch: 2.1.2+cpu\n",
      "/home/ea/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead\n",
      "  warnings.warn(\n",
      "/home/ea/miniconda3/lib/python3.11/site-packages/nncf/torch/dynamic_graph/wrappers.py:80: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
      "  op1 = operator(*args, **kwargs)\n",
      "Compiling the model to CPU ...\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "from transformers import AutoTokenizer\n",
    "from optimum.intel import OVModelForTokenClassification\n",
    "\n",
    "original_ner_model_dir = Path(\"original_ner_model\")\n",
    "\n",
    "model_id = \"elastic/distilbert-base-cased-finetuned-conll03-english\"\n",
    "if not original_ner_model_dir.exists():\n",
    "    model = OVModelForTokenClassification.from_pretrained(model_id, export=True)\n",
    "\n",
    "    model.save_pretrained(original_ner_model_dir)\n",
    "else:\n",
    "    model = OVModelForTokenClassification.from_pretrained(model_id, export=True)\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quantize the model, using Hugging Face Optimum API\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Post-training static quantization introduces an additional calibration step where data is fed through the network in order to compute the activations quantization parameters. For quantization it will be used [Hugging Face Optimum Intel API](https://huggingface.co/docs/optimum/intel/index).\n",
    "\n",
    "To handle the NNCF quantization process we use class [`OVQuantizer`](https://huggingface.co/docs/optimum/intel/reference_ov#optimum.intel.OVQuantizer). The quantization with Hugging Face Optimum Intel API contains the next steps:\n",
    "* Model class initialization starts with calling `from_pretrained()` method.\n",
    "* Next we create calibration dataset with `get_calibration_dataset()` to use for the post-training static quantization calibration step. \n",
    "* After we quantize a model and save the resulting model in the OpenVINO IR format to save_directory with `quantize()` method. \n",
    "* Then we load the quantized model. The Optimum Inference models are API compatible with Hugging Face Transformers models and we can just replace `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. So we use `OVModelForTokenClassification` to load the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
   "test_replace": {"num_samples=100,": "num_samples=10,"}
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/ea/miniconda3/lib/python3.11/site-packages/datasets/load.py:2516: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.\n",
      "You can remove this warning by passing 'token=<use_auth_token>' instead.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e166ccbed15a4f738160a2ecd9b1fa59",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c8f6e0c512fc4c96ab7701bd7423db0a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:nncf:18 ignored nodes were found by name in the NNCFGraph\n",
      "INFO:nncf:25 ignored nodes were found by name in the NNCFGraph\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cc2867963a7341a28aa9f08847e439ff",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5a1342cad34e467dbe7585841b225338",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from functools import partial\n",
    "from optimum.intel import OVQuantizer, OVConfig, OVQuantizationConfig\n",
    "\n",
    "from optimum.intel import OVModelForTokenClassification\n",
    "\n",
    "\n",
    "def preprocess_fn(data, tokenizer):\n",
    "    examples = []\n",
    "    for data_chunk in data[\"tokens\"]:\n",
    "        examples.append(\" \".join(data_chunk))\n",
    "\n",
    "    return tokenizer(examples, padding=True, truncation=True, max_length=128)\n",
    "\n",
    "\n",
    "quantizer = OVQuantizer.from_pretrained(model)\n",
    "calibration_dataset = quantizer.get_calibration_dataset(\n",
    "    \"conll2003\",\n",
    "    preprocess_function=partial(preprocess_fn, tokenizer=tokenizer),\n",
    "    num_samples=100,\n",
    "    dataset_split=\"train\",\n",
    "    preprocess_batch=True,\n",
    ")\n",
    "\n",
    "# The directory where the quantized model will be saved\n",
    "quantized_ner_model_dir = \"quantized_ner_model\"\n",
    "\n",
    "# Apply static quantization and save the resulting model in the OpenVINO IR format\n",
    "ov_config = OVConfig(quantization_config=OVQuantizationConfig(num_samples=len(calibration_dataset)))\n",
    "quantizer.quantize(\n",
    "    calibration_dataset=calibration_dataset,\n",
    "    save_directory=quantized_ner_model_dir,\n",
    "    ov_config=ov_config,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "12e9426554484eafacbd27508006015b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Dropdown(description='Device:', index=3, options=('CPU', 'GPU.0', 'GPU.1', 'AUTO'), value='AUTO')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import ipywidgets as widgets\n",
    "import openvino as ov\n",
    "\n",
    "core = ov.Core()\n",
    "device = widgets.Dropdown(\n",
    "    options=core.available_devices + [\"AUTO\"],\n",
    "    value=\"AUTO\",\n",
    "    description=\"Device:\",\n",
    "    disabled=False,\n",
    ")\n",
    "\n",
    "device"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Compiling the model to AUTO ...\n"
     ]
    }
   ],
   "source": [
    "# Load the quantized model\n",
    "optimized_model = OVModelForTokenClassification.from_pretrained(quantized_ner_model_dir, device=device.value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compare the Original and Quantized Models\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Compare the original [`distilbert-base-cased-finetuned-conll03-english`](https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english) model with quantized and converted to OpenVINO IR format models to see the difference."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare performance\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "As the Optimum Inference models are API compatible with Hugging Face Transformers models, we can just use `pipleine()` from [Hugging Face Transformers API](https://huggingface.co/docs/transformers/index) for inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "ner_pipeline_optimized = pipeline(\"token-classification\", model=optimized_model, tokenizer=tokenizer)\n",
    "\n",
    "ner_pipeline_original = pipeline(\"token-classification\", model=model, tokenizer=tokenizer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Median inference time of quantized model: 0.0063508255407214165 \n",
      "Median inference time of original model: 0.007429798366501927 \n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "def calc_perf(ner_pipeline):\n",
    "    inference_times = []\n",
    "\n",
    "    for data in calibration_dataset:\n",
    "        text = \" \".join(data[\"tokens\"])\n",
    "        start = time.perf_counter()\n",
    "        ner_pipeline(text)\n",
    "        end = time.perf_counter()\n",
    "        inference_times.append(end - start)\n",
    "\n",
    "    return np.median(inference_times)\n",
    "\n",
    "\n",
    "print(f\"Median inference time of quantized model: {calc_perf(ner_pipeline_optimized)} \")\n",
    "\n",
    "print(f\"Median inference time of original model: {calc_perf(ner_pipeline_original)} \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare size of the models\n",
    "[back to top ⬆️](#Table-of-contents:)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Size of original model in Bytes is 260795516\n",
      "Size of quantized model in Bytes is 65802712\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "fp_model_file = Path(original_ner_model_dir) / \"openvino_model.bin\"\n",
    "print(f\"Size of original model in Bytes is {fp_model_file.stat().st_size}\")\n",
    "print(f'Size of quantized model in Bytes is {Path(quantized_ner_model_dir, \"openvino_model.bin\").stat().st_size}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare demo for Named Entity Recognition OpenVINO Runtime\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Now, you can try NER model on own text. Put your sentence to input text box, click Submit button, the model label the recognized entities in the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [],
    "test_replace": {
     "demo.launch(debug=True)": "demo.launch()",
     "demo.launch(share=True, debug=True)": "demo.launch(share=True)"
    }
   },
   "outputs": [],
   "source": [
    "import gradio as gr\n",
    "\n",
    "examples = [\n",
    "    \"My name is Wolfgang and I live in Berlin.\",\n",
    "]\n",
    "\n",
    "\n",
    "def run_ner(text):\n",
    "    output = ner_pipeline_optimized(text)\n",
    "    return {\"text\": text, \"entities\": output}\n",
    "\n",
    "\n",
    "demo = gr.Interface(\n",
    "    run_ner,\n",
    "    gr.Textbox(placeholder=\"Enter sentence here...\", label=\"Input Text\"),\n",
    "    gr.HighlightedText(label=\"Output Text\"),\n",
    "    examples=examples,\n",
    "    allow_flagging=\"never\",\n",
    ")\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    try:\n",
    "        demo.launch(debug=True)\n",
    "    except Exception:\n",
    "        demo.launch(share=True, debug=True)\n",
    "# if you are launching remotely, specify server_name and server_port\n",
    "# demo.launch(server_name='your server name', server_port='server port in int')\n",
    "# Read more in the docs: https://gradio.app/docs/"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "openvino_notebooks": {
   "imageUrl": "",
   "tags": {
    "categories": [
     "Model Demos"
    ],
    "libraries": [],
    "other": [],
    "tasks": [
     "Named Entity Recognition",
     "Token Classification"
    ]
   }
  },
  "vscode": {
   "interpreter": {
    "hash": "1c707170576399eaaed0c4f2e01a2d1b61ba791ba1842c47e5b3e4f6f79b82ab"
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}