cursor_slides_internvl2 / docs /azure_deployment.md
mknolan's picture
Upload InternVL2 implementation
e59dc66 verified
|
raw
history blame
10.3 kB

Azure Machine Learning Deployment Guide

This guide provides step-by-step instructions for deploying the Image Description application to Azure Machine Learning.

Prerequisites

  • Azure subscription
  • Azure CLI installed and configured
  • Azure Machine Learning workspace
  • The source code from this repository

Step 1: Set Up Azure Machine Learning

  1. Create a Resource Group (if you don't have one):
az group create --name image-descriptor-rg --location eastus
  1. Create an Azure Machine Learning workspace:
az ml workspace create --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg \
    --location eastus

Step 2: Create a Compute Cluster

Create a GPU-enabled compute cluster for training and inference:

az ml compute create --name gpu-cluster \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg \
    --type AmlCompute \
    --min-instances 0 \
    --max-instances 1 \
    --size Standard_NC6s_v3

Step 3: Prepare Environment Configuration

Create an environment.yml file to define dependencies:

name: image_descriptor_env
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - python=3.9
  - pip=23.0
  - pytorch=2.0.0
  - torchvision=0.15.0
  - pip:
    - transformers>=4.36.0
    - accelerate>=0.25.0
    - bitsandbytes>=0.41.0
    - safetensors>=0.4.0
    - flask>=2.3.2
    - flask-cors>=4.0.0
    - gunicorn>=21.2.0
    - pillow>=10.0.0
    - matplotlib>=3.7.0
    - python-dotenv>=1.0.0
    - azureml-core>=1.48.0
    - azureml-defaults>=1.48.0
    - inference-schema>=1.4.1

Step 4: Create a Model Entry Script

Create a file called score.py to handle Azure ML model inference:

import json
import os
import io
import base64
import logging
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Global variables
model = None
processor = None
tokenizer = None

def init():
    """Initialize the model when the service starts"""
    global model, processor, tokenizer
    
    logger.info("Loading model...")
    model_id = "Qwen/Qwen2-VL-7B"
    
    # Load model components with quantization for efficiency
    processor = AutoProcessor.from_pretrained(model_id)
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    # Load model with 4-bit quantization to reduce memory requirements
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        load_in_4bit=True,
        device_map="auto"
    )
    logger.info("Model loaded successfully")

def run(raw_data):
    """Process an image and generate descriptions
    
    Args:
        raw_data: A JSON string containing the image as base64 encoded data
        
    Returns:
        A JSON string containing the descriptions
    """
    global model, processor, tokenizer
    
    try:
        # Parse input
        data = json.loads(raw_data)
        
        # Get the image data (from base64 or URL)
        if 'image_data' in data:
            image_bytes = base64.b64decode(data['image_data'])
            image = Image.open(io.BytesIO(image_bytes)).convert('RGB')
            logger.info("Loaded image from base64 data")
        elif 'image_url' in data:
            # Handle image URLs (for Azure Storage or public URLs)
            from urllib.request import urlopen
            with urlopen(data['image_url']) as response:
                image_bytes = response.read()
            image = Image.open(io.BytesIO(image_bytes)).convert('RGB')
            logger.info(f"Loaded image from URL: {data['image_url']}")
        else:
            return json.dumps({"error": "No image data or URL provided"})
        
        # Process the image
        inputs = processor(
            images=image, 
            return_tensors="pt"
        ).to(model.device)
        
        # Basic description prompt
        prompt_basic = "Describe this image briefly."
        input_ids_basic = tokenizer(prompt_basic, return_tensors="pt").input_ids.to(model.device)
        
        # Detailed description prompt
        prompt_detailed = "Analyze this image in detail. Describe the main elements, any text visible, the colors, and the overall composition."
        input_ids_detailed = tokenizer(prompt_detailed, return_tensors="pt").input_ids.to(model.device)
        
        # Technical analysis prompt
        prompt_technical = "What can you tell me about the technical aspects of this image?"
        input_ids_technical = tokenizer(prompt_technical, return_tensors="pt").input_ids.to(model.device)
        
        # Generate outputs for each prompt
        # Basic description
        with torch.no_grad():
            output_basic = model.generate(
                **inputs,
                input_ids=input_ids_basic,
                max_new_tokens=150,
                do_sample=False
            )
        basic_description = tokenizer.decode(output_basic[0], skip_special_tokens=True).replace(prompt_basic, "").strip()
        
        # Detailed description
        with torch.no_grad():
            output_detailed = model.generate(
                **inputs,
                input_ids=input_ids_detailed,
                max_new_tokens=300,
                do_sample=False
            )
        detailed_description = tokenizer.decode(output_detailed[0], skip_special_tokens=True).replace(prompt_detailed, "").strip()
        
        # Technical analysis
        with torch.no_grad():
            output_technical = model.generate(
                **inputs,
                input_ids=input_ids_technical,
                max_new_tokens=200,
                do_sample=False
            )
        technical_analysis = tokenizer.decode(output_technical[0], skip_special_tokens=True).replace(prompt_technical, "").strip()
        
        # Return the results
        return json.dumps({
            "success": True,
            "basic_description": basic_description,
            "detailed_description": detailed_description,
            "technical_analysis": technical_analysis
        })
        
    except Exception as e:
        logger.error(f"Error processing image: {str(e)}", exc_info=True)
        return json.dumps({"error": f"Error generating description: {str(e)}"})

Step 5: Register the Model

  1. Create a model.yml file:
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: qwen-vl-image-descriptor
version: 1
description: Qwen2-VL-7B model for image description
path: .
  1. Register the model:
az ml model create --file model.yml \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg

Step 6: Deploy as an Online Endpoint

  1. Create an endpoint.yml file:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: image-descriptor-endpoint
description: Endpoint for image description
auth_mode: key
  1. Create a deployment.yml file:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: qwen-vl-deployment
endpoint_name: image-descriptor-endpoint
model: azureml:qwen-vl-image-descriptor:1
environment:
  conda_file: environment.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.6-cudnn8-ubuntu20.04:latest
instance_type: Standard_NC6s_v3
instance_count: 1
request_settings:
  max_concurrent_requests_per_instance: 1
  request_timeout_ms: 120000
  1. Create the endpoint:
az ml online-endpoint create --file endpoint.yml \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg
  1. Create the deployment:
az ml online-deployment create --file deployment.yml \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg
  1. Allocate 100% traffic to the deployment:
az ml online-endpoint update --name image-descriptor-endpoint \
    --traffic "qwen-vl-deployment=100" \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg

Step 7: Test the Endpoint

You can test the endpoint using the Azure ML SDK:

import json
import base64
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import ManagedOnlineEndpoint

# Get a handle to the workspace
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential=credential,
    subscription_id="your-subscription-id",
    resource_group_name="image-descriptor-rg",
    workspace_name="image-descriptor-ws"
)

# Get endpoint
endpoint = ml_client.online_endpoints.get("image-descriptor-endpoint")

# Load and encode the image
with open('data_temp/page_2.png', 'rb') as f:
    image_data = f.read()
image_b64 = base64.b64encode(image_data).decode('utf-8')

# Create the request payload
payload = {
    'image_data': image_b64
}

# Invoke the endpoint
response = ml_client.online_endpoints.invoke(
    endpoint_name="image-descriptor-endpoint",
    request_file=json.dumps(payload),
    deployment_name="qwen-vl-deployment"
)

# Parse the response
result = json.loads(response)
print(json.dumps(result, indent=2))

Cost Optimization

To optimize costs:

  1. Use a smaller compute size if possible
  2. Scale to zero instances when not in use
  3. Set up autoscaling rules
  4. Consider reserved instances for long-term deployments

Monitoring

Monitor your endpoint using:

  1. Azure Monitor
  2. Application Insights
  3. Azure ML metrics dashboard
  4. Set up alerts for anomalies

Cleanup

To avoid ongoing charges, delete resources when not in use:

# Delete the endpoint
az ml online-endpoint delete --name image-descriptor-endpoint \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg -y

# Delete compute cluster
az ml compute delete --name gpu-cluster \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg -y

# Delete workspace (optional)
az ml workspace delete --name image-descriptor-ws \
    --resource-group image-descriptor-rg -y

# Delete resource group (optional)
az group delete --name image-descriptor-rg -y