# AWS SageMaker Deployment Guide This guide provides step-by-step instructions for deploying the Image Description application to AWS SageMaker. ## Prerequisites - AWS account with SageMaker permissions - AWS CLI installed and configured - Docker installed on your local machine - The source code from this repository ## Step 1: Create an Amazon ECR Repository ```bash aws ecr create-repository --repository-name image-descriptor ``` Note the repository URI returned by this command. You'll use it in the next step. ## Step 2: Build and Push the Docker Image 1. Log in to ECR: ```bash aws ecr get-login-password --region your-region | docker login --username AWS --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com ``` 2. Build the Docker image: ```bash docker build -t image-descriptor . ``` 3. Tag and push the image: ```bash docker tag image-descriptor:latest your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest docker push your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest ``` ## Step 3: Create a SageMaker Model 1. Create a model.json file: ```json { "ModelName": "QwenVLImageDescriptor", "PrimaryContainer": { "Image": "your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest", "Environment": { "PORT": "8080" } }, "ExecutionRoleArn": "arn:aws:iam::your-account-id:role/service-role/AmazonSageMaker-ExecutionRole" } ``` 2. Create the SageMaker model: ```bash aws sagemaker create-model --cli-input-json file://model.json ``` ## Step 4: Create an Endpoint Configuration 1. Create a config.json file: ```json { "EndpointConfigName": "QwenVLImageDescriptorConfig", "ProductionVariants": [ { "VariantName": "AllTraffic", "ModelName": "QwenVLImageDescriptor", "InstanceType": "ml.g5.2xlarge", "InitialInstanceCount": 1 } ] } ``` 2. Create the endpoint configuration: ```bash aws sagemaker create-endpoint-config --cli-input-json file://config.json ``` ## Step 5: Create the Endpoint ```bash aws sagemaker create-endpoint --endpoint-name qwen-vl-image-descriptor --endpoint-config-name QwenVLImageDescriptorConfig ``` This will take several minutes to deploy. ## Step 6: Invoke the Endpoint You can invoke the endpoint using the AWS SDK or AWS CLI. Using Python SDK: ```python import boto3 import json import base64 from PIL import Image import io # Initialize the SageMaker runtime client runtime = boto3.client('sagemaker-runtime') # Load and encode the image with open('data_temp/page_2.png', 'rb') as f: image_data = f.read() image_b64 = base64.b64encode(image_data).decode('utf-8') # Create the request payload payload = { 'image_data': image_b64 } # Invoke the endpoint response = runtime.invoke_endpoint( EndpointName='qwen-vl-image-descriptor', ContentType='application/json', Body=json.dumps(payload) ) # Parse the response result = json.loads(response['Body'].read().decode()) print(json.dumps(result, indent=2)) ``` ## Step 7: Set Up API Gateway (Optional) For public HTTP access, set up an API Gateway: 1. Create a new REST API in API Gateway 2. Create a new resource and POST method 3. Configure the integration to use the SageMaker endpoint 4. Deploy the API to a stage 5. Note the API Gateway URL for client use ## Cost Optimization To optimize costs: 1. Use SageMaker Serverless Inference instead of a dedicated endpoint 2. Implement auto-scaling for your endpoint 3. Use Spot Instances for non-critical workloads 4. Schedule endpoints to be active only during business hours ## Monitoring Set up CloudWatch Alarms to monitor: 1. Endpoint invocation metrics 2. Error rates 3. Latency 4. Instance utilization ## Cleanup To avoid ongoing charges, delete resources when not in use: ```bash aws sagemaker delete-endpoint --endpoint-name qwen-vl-image-descriptor aws sagemaker delete-endpoint-config --endpoint-config-name QwenVLImageDescriptorConfig aws sagemaker delete-model --model-name QwenVLImageDescriptor ```