# Serving Intel® Geti™ models with OpenVINO Model Server
This notebook shows how to set up an OpenVINO model server to serve the models trained
in your Intel® Geti™ project. It also shows how to use the Geti SDK as a client to
make inference requests to the model server.

# Contents

1. **OpenVINO Model Server**
 1. Requirements
 2. Generating the model server configuration
 3. Launching the model server

2. **OVMS inference with Geti SDK**
 1. Loading inference model and sample image
 2. Requesting inference
 3. Inspecting the results

3. **Conclusion**
 1. Cleaning up

> **NOTE**: This notebook will set up a model server on the same machine that will be
> used as a client to request inference. In a real scenario you'd most likely
> want the server and the client to be different physical machines. The steps to set up
> OVMS on a remote server are the same as for the local server outlined in this
> notebook, but additional network configuration and security measures are most likely
> required.

# OpenVINO Model Server
## Requirements
We will be running the OpenVINO Model Server (OVMS) with Docker. Please make sure you
have docker available on your system. You can install it by following the instructions
[here](https://docs.docker.com/get-docker/).

## Generating the model server configuration
The `deployment` that was downloaded from the Intel® Geti™ platform can be used to create
the configuration files that are needed to set up an OpenVINO model server for your project.

The cell below shows how to create the configuration. Running this cell should create
a folder called `ovms_models` in a temporary directory. The `ovms_models` folder
contains the models and the configuration files required to run OVMS for the Intel®
Geti™ project.

In [None]:
import os
import tempfile

from geti_sdk.deployment import Deployment

deployment_path = os.path.join("..", "deployment")

# Load the Geti deployment
deployment = Deployment.from_folder(deployment_path)

# Creating the OVMS configuration for the deployment
# First, we'll create a temporary directory to store the config files
ovms_config_path = os.path.join(tempfile.mkdtemp(), "ovms_models")

# Next, we generate the OVMS configuration and save it
deployment.generate_ovms_config(output_folder=ovms_config_path)

print(f"Configuration for OpenVINO Model Server was created at '{ovms_config_path}'")

## Launching the model server
As mentioned before, we will run OVMS in a Docker container. First, we need to make sure
that we have the latest OVMS image on our system. Run the cell below to pull the image.

In [None]:
! docker pull openvino/model_server:latest

Next, we have to start the container with the configuration that we just generated. This
is done in the cell below.

> NOTE: The cell below starts the OVMS container and sets it up to listen for inference
> requests on port 9000 on your system. If this port is already occupied the `docker run`
> command will fail and you may need to try a different port number.

In [None]:
# Launch the OVMS container
result = ! docker run -d --rm -v {ovms_config_path}:/models -p 9000:9000 --name ovms_demo openvino/model_server:latest --port 9000 --config_path /models/ovms_model_config.json

# Check that the container was created successfully
if len(result) == 1:
 container_id = result[0]
 print(f"OVMS container with ID '{container_id}' created.")
else:
 # Anything other than 1 result indicates that something went wrong
 raise RuntimeError(result)

# Check that the container is running properly
container_info = ! docker container inspect {container_id}
container_status = str(container_info.grep("Status"))

if not container_status or not "running" in container_status:
 raise RuntimeError(
 f"Invalid ovms docker container status found: {container_status}. Most "
 f"likely the container has not started properly."
 )
print("OVMS container is up and running.")

That's it! If all went well the cell above should print the ID of the container that
was created. This can be used to identify your container if you have a lot of docker
containers running on your system.

# OVMS inference with Geti SDK
Now that the OVMS container is running, we can use the Geti SDK to talk to it and make an
inference request. The remaining part of this notebook shows how to do so.

## Loading inference model and sample image
In the first part of this notebook we created configuration files for OVMS, using the
`deployment` that was generated for your Intel® Geti™ project. To do inference, we need
to connect the deployment to the OVMS container that is now running. This is done in the
cell below.

In [None]:
# Load the inference models by connecting to OVMS on port 9000
deployment.load_inference_models(device="http://localhost:9000")

print("Connected to OpenVINO Model Server.")

You should see some output indicating that the connection to OVMS was made successfully.
If you see any errors at this stage, make sure your OVMS container is running and that the
port number is correct.

Next up, we'll load a sample image from the project to run inference on

In [None]:
import cv2

# Load the sample image
image = cv2.imread("../sample_image.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Show the image in the notebook
from IPython.display import display
from PIL import Image

display(Image.fromarray(image_rgb))

## Requesting inference
Now that everything is set up, making an inference request is very simple:

In [None]:
import time

t_start = time.time()
prediction = deployment.infer(image_rgb)
t_end = time.time()

print(
 f"OVMS inference on sample image completed in {(t_end - t_start) * 1000:.1f} milliseconds."
)

## Inspecting the results
Note that the code to request inference is exactly the same as for the case when the model
is loaded on the CPU (see `demo_notebook.ipynb`). Like The `prediction` can be shown using
the Geti SDK visualization utility function.

In [None]:
from geti_sdk.utils import show_image_with_annotation_scene

show_image_with_annotation_scene(image_rgb, prediction, show_in_notebook=True);

# Conclusion
That's all there is to it! Of course in practice the client would request inference
from an OpenVINO model server on a different physical machine, in contrast to the
example here where client and server are running on the same machine.

The steps outlined in this notebook can be used as a basis to set up a remote
client/server combination, but please note that additional network configuration will
be required (along with necessary security measures).

## Cleaning up
To clean up, we'll stop the OVMS docker container that we started. This will
automatically remove the container. After that, we'll delete the temporary directory
we created to store the config files.

In [None]:
# Stop the container
result = ! docker stop {container_id}

# Check if removing the container worked correctly
if result[0] == container_id:
 print(f"OVMS container '{container_id}' stopped and removed successfully.")
else:
 print(
 "An error occurred while removing OVMS docker container. Most likely the container "
 "was already removed. "
 )
 print(f"The docker daemon responded with the following error: \n{result}")
 
# Remove the temporary directory with the OVMS configuration
import shutil

temp_dir = os.path.dirname(ovms_config_path)
try:
 shutil.rmtree(temp_dir)
 print("Temporary configuration directory removed successfully.")
except FileNotFoundError:
 print(
 f"Temporary directory with OVMS configuration '{temp_dir}' was "
 f"not found on the system. Most likely it is already removed."
 )