AWS SageMaker Deployment Guide
This guide provides step-by-step instructions for deploying the Image Description application to AWS SageMaker.
Prerequisites
- AWS account with SageMaker permissions
- AWS CLI installed and configured
- Docker installed on your local machine
- The source code from this repository
Step 1: Create an Amazon ECR Repository
aws ecr create-repository --repository-name image-descriptor
Note the repository URI returned by this command. You'll use it in the next step.
Step 2: Build and Push the Docker Image
- Log in to ECR:
aws ecr get-login-password --region your-region | docker login --username AWS --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com
- Build the Docker image:
docker build -t image-descriptor .
- Tag and push the image:
docker tag image-descriptor:latest your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest
docker push your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest
Step 3: Create a SageMaker Model
- Create a model.json file:
{
"ModelName": "QwenVLImageDescriptor",
"PrimaryContainer": {
"Image": "your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest",
"Environment": {
"PORT": "8080"
}
},
"ExecutionRoleArn": "arn:aws:iam::your-account-id:role/service-role/AmazonSageMaker-ExecutionRole"
}
- Create the SageMaker model:
aws sagemaker create-model --cli-input-json file://model.json
Step 4: Create an Endpoint Configuration
- Create a config.json file:
{
"EndpointConfigName": "QwenVLImageDescriptorConfig",
"ProductionVariants": [
{
"VariantName": "AllTraffic",
"ModelName": "QwenVLImageDescriptor",
"InstanceType": "ml.g5.2xlarge",
"InitialInstanceCount": 1
}
]
}
- Create the endpoint configuration:
aws sagemaker create-endpoint-config --cli-input-json file://config.json
Step 5: Create the Endpoint
aws sagemaker create-endpoint --endpoint-name qwen-vl-image-descriptor --endpoint-config-name QwenVLImageDescriptorConfig
This will take several minutes to deploy.
Step 6: Invoke the Endpoint
You can invoke the endpoint using the AWS SDK or AWS CLI.
Using Python SDK:
import boto3
import json
import base64
from PIL import Image
import io
# Initialize the SageMaker runtime client
runtime = boto3.client('sagemaker-runtime')
# Load and encode the image
with open('data_temp/page_2.png', 'rb') as f:
image_data = f.read()
image_b64 = base64.b64encode(image_data).decode('utf-8')
# Create the request payload
payload = {
'image_data': image_b64
}
# Invoke the endpoint
response = runtime.invoke_endpoint(
EndpointName='qwen-vl-image-descriptor',
ContentType='application/json',
Body=json.dumps(payload)
)
# Parse the response
result = json.loads(response['Body'].read().decode())
print(json.dumps(result, indent=2))
Step 7: Set Up API Gateway (Optional)
For public HTTP access, set up an API Gateway:
- Create a new REST API in API Gateway
- Create a new resource and POST method
- Configure the integration to use the SageMaker endpoint
- Deploy the API to a stage
- Note the API Gateway URL for client use
Cost Optimization
To optimize costs:
- Use SageMaker Serverless Inference instead of a dedicated endpoint
- Implement auto-scaling for your endpoint
- Use Spot Instances for non-critical workloads
- Schedule endpoints to be active only during business hours
Monitoring
Set up CloudWatch Alarms to monitor:
- Endpoint invocation metrics
- Error rates
- Latency
- Instance utilization
Cleanup
To avoid ongoing charges, delete resources when not in use:
aws sagemaker delete-endpoint --endpoint-name qwen-vl-image-descriptor
aws sagemaker delete-endpoint-config --endpoint-config-name QwenVLImageDescriptorConfig
aws sagemaker delete-model --model-name QwenVLImageDescriptor