|
# AWS SageMaker Deployment Guide |
|
|
|
This guide provides step-by-step instructions for deploying the Image Description application to AWS SageMaker. |
|
|
|
## Prerequisites |
|
|
|
- AWS account with SageMaker permissions |
|
- AWS CLI installed and configured |
|
- Docker installed on your local machine |
|
- The source code from this repository |
|
|
|
## Step 1: Create an Amazon ECR Repository |
|
|
|
```bash |
|
aws ecr create-repository --repository-name image-descriptor |
|
``` |
|
|
|
Note the repository URI returned by this command. You'll use it in the next step. |
|
|
|
## Step 2: Build and Push the Docker Image |
|
|
|
1. Log in to ECR: |
|
|
|
```bash |
|
aws ecr get-login-password --region your-region | docker login --username AWS --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com |
|
``` |
|
|
|
2. Build the Docker image: |
|
|
|
```bash |
|
docker build -t image-descriptor . |
|
``` |
|
|
|
3. Tag and push the image: |
|
|
|
```bash |
|
docker tag image-descriptor:latest your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest |
|
docker push your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest |
|
``` |
|
|
|
## Step 3: Create a SageMaker Model |
|
|
|
1. Create a model.json file: |
|
|
|
```json |
|
{ |
|
"ModelName": "QwenVLImageDescriptor", |
|
"PrimaryContainer": { |
|
"Image": "your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest", |
|
"Environment": { |
|
"PORT": "8080" |
|
} |
|
}, |
|
"ExecutionRoleArn": "arn:aws:iam::your-account-id:role/service-role/AmazonSageMaker-ExecutionRole" |
|
} |
|
``` |
|
|
|
2. Create the SageMaker model: |
|
|
|
```bash |
|
aws sagemaker create-model --cli-input-json file://model.json |
|
``` |
|
|
|
## Step 4: Create an Endpoint Configuration |
|
|
|
1. Create a config.json file: |
|
|
|
```json |
|
{ |
|
"EndpointConfigName": "QwenVLImageDescriptorConfig", |
|
"ProductionVariants": [ |
|
{ |
|
"VariantName": "AllTraffic", |
|
"ModelName": "QwenVLImageDescriptor", |
|
"InstanceType": "ml.g5.2xlarge", |
|
"InitialInstanceCount": 1 |
|
} |
|
] |
|
} |
|
``` |
|
|
|
2. Create the endpoint configuration: |
|
|
|
```bash |
|
aws sagemaker create-endpoint-config --cli-input-json file://config.json |
|
``` |
|
|
|
## Step 5: Create the Endpoint |
|
|
|
```bash |
|
aws sagemaker create-endpoint --endpoint-name qwen-vl-image-descriptor --endpoint-config-name QwenVLImageDescriptorConfig |
|
``` |
|
|
|
This will take several minutes to deploy. |
|
|
|
## Step 6: Invoke the Endpoint |
|
|
|
You can invoke the endpoint using the AWS SDK or AWS CLI. |
|
|
|
Using Python SDK: |
|
|
|
```python |
|
import boto3 |
|
import json |
|
import base64 |
|
from PIL import Image |
|
import io |
|
|
|
# Initialize the SageMaker runtime client |
|
runtime = boto3.client('sagemaker-runtime') |
|
|
|
# Load and encode the image |
|
with open('data_temp/page_2.png', 'rb') as f: |
|
image_data = f.read() |
|
image_b64 = base64.b64encode(image_data).decode('utf-8') |
|
|
|
# Create the request payload |
|
payload = { |
|
'image_data': image_b64 |
|
} |
|
|
|
# Invoke the endpoint |
|
response = runtime.invoke_endpoint( |
|
EndpointName='qwen-vl-image-descriptor', |
|
ContentType='application/json', |
|
Body=json.dumps(payload) |
|
) |
|
|
|
# Parse the response |
|
result = json.loads(response['Body'].read().decode()) |
|
print(json.dumps(result, indent=2)) |
|
``` |
|
|
|
## Step 7: Set Up API Gateway (Optional) |
|
|
|
For public HTTP access, set up an API Gateway: |
|
|
|
1. Create a new REST API in API Gateway |
|
2. Create a new resource and POST method |
|
3. Configure the integration to use the SageMaker endpoint |
|
4. Deploy the API to a stage |
|
5. Note the API Gateway URL for client use |
|
|
|
## Cost Optimization |
|
|
|
To optimize costs: |
|
|
|
1. Use SageMaker Serverless Inference instead of a dedicated endpoint |
|
2. Implement auto-scaling for your endpoint |
|
3. Use Spot Instances for non-critical workloads |
|
4. Schedule endpoints to be active only during business hours |
|
|
|
## Monitoring |
|
|
|
Set up CloudWatch Alarms to monitor: |
|
|
|
1. Endpoint invocation metrics |
|
2. Error rates |
|
3. Latency |
|
4. Instance utilization |
|
|
|
## Cleanup |
|
|
|
To avoid ongoing charges, delete resources when not in use: |
|
|
|
```bash |
|
aws sagemaker delete-endpoint --endpoint-name qwen-vl-image-descriptor |
|
aws sagemaker delete-endpoint-config --endpoint-config-name QwenVLImageDescriptorConfig |
|
aws sagemaker delete-model --model-name QwenVLImageDescriptor |
|
``` |