malvika2003's picture
Upload folder using huggingface_hub
db5855f verified

A newer version of the Gradio SDK is available: 5.23.3

Upgrade

Visual-language assistant with LLaVA Next and OpenVINO

Colab

nanoLLaVA is a "small but mighty" 1B vision-language model designed to run efficiently on edge devices. It uses SigLIP-400m as Image Encoder and Qwen1.5-0.5B as LLM. In this tutorial, we consider how to convert and run nanoLLaVA model using OpenVINO. Additionally, we will optimize model using NNCF

Notebook contents

The tutorial consists from following steps:

  • Install requirements
  • Download PyTorch model
  • Convert model to OpenVINO Intermediate Representation (IR)
  • Compress model weights using NNCF
  • Prepare Inference Pipeline
  • Run OpenVINO model inference
  • Launch Interactive demo

In this demonstration, you'll create interactive chatbot that can answer questions about provided image's content.

Installation instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.