AWS SageMaker Deployment Guide

This guide provides step-by-step instructions for deploying the Image Description application to AWS SageMaker.

Prerequisites

AWS account with SageMaker permissions
AWS CLI installed and configured
Docker installed on your local machine
The source code from this repository

Step 1: Create an Amazon ECR Repository

aws ecr create-repository --repository-name image-descriptor

Note the repository URI returned by this command. You'll use it in the next step.

Step 2: Build and Push the Docker Image

aws ecr get-login-password --region your-region | docker login --username AWS --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com

Build the Docker image:

docker build -t image-descriptor .

Tag and push the image:

docker tag image-descriptor:latest your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest
docker push your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest

Step 3: Create a SageMaker Model

Create a model.json file:

{
    "ModelName": "QwenVLImageDescriptor",
    "PrimaryContainer": {
        "Image": "your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest",
        "Environment": {
            "PORT": "8080"
        }
    },
    "ExecutionRoleArn": "arn:aws:iam::your-account-id:role/service-role/AmazonSageMaker-ExecutionRole"
}

Create the SageMaker model:

aws sagemaker create-model --cli-input-json file://model.json

Step 4: Create an Endpoint Configuration

Create a config.json file:

{
    "EndpointConfigName": "QwenVLImageDescriptorConfig",
    "ProductionVariants": [
        {
            "VariantName": "AllTraffic",
            "ModelName": "QwenVLImageDescriptor",
            "InstanceType": "ml.g5.2xlarge",
            "InitialInstanceCount": 1
        }
    ]
}

Create the endpoint configuration:

aws sagemaker create-endpoint-config --cli-input-json file://config.json

Step 5: Create the Endpoint

aws sagemaker create-endpoint --endpoint-name qwen-vl-image-descriptor --endpoint-config-name QwenVLImageDescriptorConfig

This will take several minutes to deploy.

Step 6: Invoke the Endpoint

You can invoke the endpoint using the AWS SDK or AWS CLI.

Using Python SDK:

import boto3
import json
import base64
from PIL import Image
import io

# Initialize the SageMaker runtime client
runtime = boto3.client('sagemaker-runtime')

# Load and encode the image
with open('data_temp/page_2.png', 'rb') as f:
    image_data = f.read()
image_b64 = base64.b64encode(image_data).decode('utf-8')

# Create the request payload
payload = {
    'image_data': image_b64
}

# Invoke the endpoint
response = runtime.invoke_endpoint(
    EndpointName='qwen-vl-image-descriptor',
    ContentType='application/json',
    Body=json.dumps(payload)
)

# Parse the response
result = json.loads(response['Body'].read().decode())
print(json.dumps(result, indent=2))

Step 7: Set Up API Gateway (Optional)

For public HTTP access, set up an API Gateway:

Create a new REST API in API Gateway
Create a new resource and POST method
Configure the integration to use the SageMaker endpoint
Deploy the API to a stage
Note the API Gateway URL for client use

Cost Optimization

To optimize costs:

Use SageMaker Serverless Inference instead of a dedicated endpoint
Implement auto-scaling for your endpoint
Use Spot Instances for non-critical workloads
Schedule endpoints to be active only during business hours

Monitoring

Set up CloudWatch Alarms to monitor:

Endpoint invocation metrics
Error rates
Latency
Instance utilization

Cleanup

To avoid ongoing charges, delete resources when not in use:

aws sagemaker delete-endpoint --endpoint-name qwen-vl-image-descriptor
aws sagemaker delete-endpoint-config --endpoint-config-name QwenVLImageDescriptorConfig
aws sagemaker delete-model --model-name QwenVLImageDescriptor