{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "01412caf", "metadata": {}, "source": [ "# Hello Model Server\n", "\n", "Introduction to OpenVINO™ Model Server (OVMS).\n", "\n", "## What is Model Serving?\n", "A model server hosts models and makes them accessible to software components over standard network protocols. A client sends a request to the model server, which performs inference and sends a response back to the client. Model serving offers many advantages for efficient model deployment:\n", "\n", "- Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.\n", "- Applications are independent of the model framework, hardware device, and infrastructure.\n", "- Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.\n", "- Clients require fewer updates since client libraries change very rarely.\n", "- Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.\n", "- Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.\n", "- Efficient resource utilization with horizontal and vertical inference scaling.\n", " \n", "\n", "\n", "\n", "#### Table of contents:\n", "\n", "- [Serving with OpenVINO Model Server](#Serving-with-OpenVINO-Model-Server)\n", "- [Step 1: Prepare Docker](#Step-1:-Prepare-Docker)\n", "- [Step 2: Preparing a Model Repository](#Step-2:-Preparing-a-Model-Repository)\n", "- [Step 3: Start the Model Server Container](#Step-3:-Start-the-Model-Server-Container)\n", "- [Step 4: Prepare the Example Client Components](#Step-4:-Prepare-the-Example-Client-Components)\n", " - [Prerequisites](#Prerequisites)\n", " - [Imports](#Imports)\n", " - [Request Model Status](#Request-Model-Status)\n", " - [Request Model Metadata](#Request-Model-Metadata)\n", " - [Load input image](#Load-input-image)\n", " - [Request Prediction on a Numpy Array](#Request-Prediction-on-a-Numpy-Array)\n", " - [Visualization](#Visualization)\n", "- [References](#References)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "efce7a1c", "metadata": {}, "source": [ "## Serving with OpenVINO Model Server\n", "[back to top ⬆️](#Table-of-contents:)\n", "OpenVINO Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures, the model server uses the same architecture and API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.\n", "\n", "\n", "\n", "To quickly start using OpenVINO™ Model Server, follow these steps:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "740bfdd8", "metadata": {}, "source": [ "## Step 1: Prepare Docker\n", "[back to top ⬆️](#Table-of-contents:)\n", "Install [Docker Engine](https://docs.docker.com/engine/install/), including its [post-installation](https://docs.docker.com/engine/install/linux-postinstall/) steps, on your development system. To verify installation, test it, using the following command. When it is ready, it will display a test image and a message." ] }, { "cell_type": "code", "execution_count": 1, "id": "73d7aedb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Hello from Docker!\n", "This message shows that your installation appears to be working correctly.\n", "\n", "To generate this message, Docker took the following steps:\n", " 1. The Docker client contacted the Docker daemon.\n", " 2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n", " (amd64)\n", " 3. The Docker daemon created a new container from that image which runs the\n", " executable that produces the output you are currently reading.\n", " 4. The Docker daemon streamed that output to the Docker client, which sent it\n", " to your terminal.\n", "\n", "To try something more ambitious, you can run an Ubuntu container with:\n", " $ docker run -it ubuntu bash\n", "\n", "Share images, automate workflows, and more with a free Docker ID:\n", " https://hub.docker.com/\n", "\n", "For more examples and ideas, visit:\n", " https://docs.docker.com/get-started/\n", "\n" ] } ], "source": [ "!docker run hello-world" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c8052a30", "metadata": {}, "source": [ "## Step 2: Preparing a Model Repository\n", "[back to top ⬆️](#Table-of-contents:)\n", "The models need to be placed and mounted in a particular directory structure and according to the following rules:\n", "```\n", "tree models/\n", "models/\n", "├── model1\n", "│ ├── 1\n", "│ │ ├── ir_model.bin\n", "│ │ └── ir_model.xml\n", "│ └── 2\n", "│ ├── ir_model.bin\n", "│ └── ir_model.xml\n", "├── model2\n", "│ └── 1\n", "│ ├── ir_model.bin\n", "│ ├── ir_model.xml\n", "│ └── mapping_config.json\n", "├── model3\n", "│ └── 1\n", "│ └── model.onnx\n", "├── model4\n", "│ └── 1\n", "│ ├── model.pdiparams\n", "│ └── model.pdmodel\n", "└── model5\n", " └── 1\n", " └── TF_fronzen_model.pb\n", "```\n", "\n", "\n", "* Each model should be stored in a dedicated directory, for example, model1 and model2.\n", "\n", "* Each model directory should include a sub-folder for each of its versions (1,2, etc). The versions and their folder names should be positive integer values.\n", "\n", "* Note that in execution, the versions are enabled according to a pre-defined version policy. If the client does not specify the version number in parameters, by default, the latest version is served.\n", "\n", "* Every version folder must include model files, that is, `.bin` and `.xml` for OpenVINO IR, `.onnx` for ONNX, `.pdiparams` and `.pdmodel` for Paddle Paddle, and `.pb` for TensorFlow. The file name can be arbitrary.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4fe8d873", "metadata": {}, "outputs": [], "source": [ "import platform\n", "\n", "%pip install -q \"openvino>=2023.1.0\" opencv-python tqdm\n", "\n", "if platform.system() != \"Windows\":\n", " %pip install -q \"matplotlib>=3.4\"\n", "else:\n", " %pip install -q \"matplotlib>=3.4,<3.7\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "9230a63a", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "db2e263d9a434b669974d0408f24a2cc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "models/detection/1/horizontal-text-detection-0001.xml: 0%| | 0.00/680k [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f273c7cd3da84e589a0844796633039c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "models/detection/1/horizontal-text-detection-0001.bin: 0%| | 0.00/7.39M [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "PosixPath('/home/ethan/intel/openvino_notebooks/notebooks/model-server/models/detection/1/horizontal-text-detection-0001.bin')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "\n", "# Fetch `notebook_utils` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", ")\n", "\n", "open(\"notebook_utils.py\", \"w\").write(r.text)\n", "from notebook_utils import download_file\n", "\n", "dedicated_dir = \"models\"\n", "model_name = \"detection\"\n", "model_version = \"1\"\n", "\n", "MODEL_DIR = f\"{dedicated_dir}/{model_name}/{model_version}\"\n", "XML_PATH = \"horizontal-text-detection-0001.xml\"\n", "BIN_PATH = \"horizontal-text-detection-0001.bin\"\n", "os.makedirs(MODEL_DIR, exist_ok=True)\n", "model_xml_url = (\n", " \"https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.3/models_bin/1/horizontal-text-detection-0001/FP32/horizontal-text-detection-0001.xml\"\n", ")\n", "model_bin_url = (\n", " \"https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.3/models_bin/1/horizontal-text-detection-0001/FP32/horizontal-text-detection-0001.bin\"\n", ")\n", "\n", "download_file(model_xml_url, XML_PATH, MODEL_DIR)\n", "download_file(model_bin_url, BIN_PATH, MODEL_DIR)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b66674df", "metadata": {}, "source": [ "## Step 3: Start the Model Server Container\n", "[back to top ⬆️](#Table-of-contents:)\n", "Pull and start the container:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2f086b77-e1e8-4809-8061-dc9b4a57bc7f", "metadata": {}, "source": [ "Searching for an available serving port in local." ] }, { "cell_type": "code", "execution_count": 22, "id": "b8521e00-3860-4865-85be-2c32332ad8ab", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Port 39801 is available\n" ] } ], "source": [ "import socket\n", "\n", "sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n", "sock.bind((\"localhost\", 0))\n", "sock.listen(1)\n", "port = sock.getsockname()[1]\n", "sock.close()\n", "print(f\"Port {port} is available\")\n", "\n", "os.environ[\"port\"] = str(port)" ] }, { "cell_type": "code", "execution_count": 7, "id": "4fc2e171", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "64aa9391ba019b3ef26ae3010e5605e38d0a12e3f93bf74b3afb938f39b86ad2\n" ] } ], "source": [ "!docker run -d --rm --name=\"ovms\" -v $(pwd)/models:/models -p $port:9000 openvino/model_server:latest --model_path /models/detection/ --model_name detection --port 9000" ] }, { "attachments": {}, "cell_type": "markdown", "id": "21744b6c", "metadata": {}, "source": [ "Check whether the OVMS container is running normally:" ] }, { "cell_type": "code", "execution_count": 8, "id": "d066734e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "64aa9391ba01 openvino/model_server:latest \"/ovms/bin/ovms --mo…\" 29 seconds ago Up 28 seconds 0.0.0.0:37581->9000/tcp, :::37581->9000/tcp ovms\n" ] } ], "source": [ "!docker ps | grep ovms" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e8ab7f4c", "metadata": {}, "source": [ "The required Model Server parameters are listed below. For additional configuration options, see the [Model Server Parameters section](https://docs.openvino.ai/2024/ovms_docs_parameters.html).\n", "\n", "
–rm | \n",
"\n",
" \n",
"remove the container when exiting the Docker container \n",
" | \n",
"
-d | \n",
"\n",
" \n",
"runs the container in the background \n",
" | \n",
"
-v | \n",
"\n",
" \n",
"defines how to mount the model folder in the Docker container \n",
" | \n",
"
-p | \n",
"\n",
" \n",
"exposes the model serving port outside the Docker container \n",
" | \n",
"
openvino/model_server:latest | \n",
"\n",
" \n",
"represents the image name; the OVMS binary is the Docker entry point \n",
"varies by tag and build process - see tags: https://hub.docker.com/r/openvino/model_server/tags/ for a full tag list. \n",
" | \n",
"
–model_path | \n",
"\n",
" \n",
"model location, which can be: \n",
"a Docker container path that is mounted during start-up \n",
"a Google Cloud Storage path \n",
"gs://<bucket>/<model_path> an AWS S3 path \n",
"s3://<bucket>/<model_path> an Azure blob path \n",
"az://<container>/<model_path> | \n",
"
–model_name | \n",
"\n",
" \n",
"the name of the model in the model_path \n",
" | \n",
"
–port | \n",
"\n",
" \n",
"the gRPC server port \n",
" | \n",
"
–rest_port | \n",
"\n",
" \n",
"the REST server port \n",
" | \n",
"