File size: 10,304 Bytes
e59dc66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
# Azure Machine Learning Deployment Guide

This guide provides step-by-step instructions for deploying the Image Description application to Azure Machine Learning.

## Prerequisites

- Azure subscription
- Azure CLI installed and configured
- Azure Machine Learning workspace
- The source code from this repository

## Step 1: Set Up Azure Machine Learning

1. Create a Resource Group (if you don't have one):

```bash
az group create --name image-descriptor-rg --location eastus
```

2. Create an Azure Machine Learning workspace:

```bash
az ml workspace create --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg \
    --location eastus
```

## Step 2: Create a Compute Cluster

Create a GPU-enabled compute cluster for training and inference:

```bash
az ml compute create --name gpu-cluster \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg \
    --type AmlCompute \
    --min-instances 0 \
    --max-instances 1 \
    --size Standard_NC6s_v3
```

## Step 3: Prepare Environment Configuration

Create an environment.yml file to define dependencies:

```yaml
name: image_descriptor_env
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - python=3.9
  - pip=23.0
  - pytorch=2.0.0
  - torchvision=0.15.0
  - pip:
    - transformers>=4.36.0
    - accelerate>=0.25.0
    - bitsandbytes>=0.41.0
    - safetensors>=0.4.0
    - flask>=2.3.2
    - flask-cors>=4.0.0
    - gunicorn>=21.2.0
    - pillow>=10.0.0
    - matplotlib>=3.7.0
    - python-dotenv>=1.0.0
    - azureml-core>=1.48.0
    - azureml-defaults>=1.48.0
    - inference-schema>=1.4.1
```

## Step 4: Create a Model Entry Script

Create a file called `score.py` to handle Azure ML model inference:

```python
import json
import os
import io
import base64
import logging
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Global variables
model = None
processor = None
tokenizer = None

def init():
    """Initialize the model when the service starts"""
    global model, processor, tokenizer
    
    logger.info("Loading model...")
    model_id = "Qwen/Qwen2-VL-7B"
    
    # Load model components with quantization for efficiency
    processor = AutoProcessor.from_pretrained(model_id)
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    # Load model with 4-bit quantization to reduce memory requirements
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        load_in_4bit=True,
        device_map="auto"
    )
    logger.info("Model loaded successfully")

def run(raw_data):
    """Process an image and generate descriptions
    
    Args:
        raw_data: A JSON string containing the image as base64 encoded data
        
    Returns:
        A JSON string containing the descriptions
    """
    global model, processor, tokenizer
    
    try:
        # Parse input
        data = json.loads(raw_data)
        
        # Get the image data (from base64 or URL)
        if 'image_data' in data:
            image_bytes = base64.b64decode(data['image_data'])
            image = Image.open(io.BytesIO(image_bytes)).convert('RGB')
            logger.info("Loaded image from base64 data")
        elif 'image_url' in data:
            # Handle image URLs (for Azure Storage or public URLs)
            from urllib.request import urlopen
            with urlopen(data['image_url']) as response:
                image_bytes = response.read()
            image = Image.open(io.BytesIO(image_bytes)).convert('RGB')
            logger.info(f"Loaded image from URL: {data['image_url']}")
        else:
            return json.dumps({"error": "No image data or URL provided"})
        
        # Process the image
        inputs = processor(
            images=image, 
            return_tensors="pt"
        ).to(model.device)
        
        # Basic description prompt
        prompt_basic = "Describe this image briefly."
        input_ids_basic = tokenizer(prompt_basic, return_tensors="pt").input_ids.to(model.device)
        
        # Detailed description prompt
        prompt_detailed = "Analyze this image in detail. Describe the main elements, any text visible, the colors, and the overall composition."
        input_ids_detailed = tokenizer(prompt_detailed, return_tensors="pt").input_ids.to(model.device)
        
        # Technical analysis prompt
        prompt_technical = "What can you tell me about the technical aspects of this image?"
        input_ids_technical = tokenizer(prompt_technical, return_tensors="pt").input_ids.to(model.device)
        
        # Generate outputs for each prompt
        # Basic description
        with torch.no_grad():
            output_basic = model.generate(
                **inputs,
                input_ids=input_ids_basic,
                max_new_tokens=150,
                do_sample=False
            )
        basic_description = tokenizer.decode(output_basic[0], skip_special_tokens=True).replace(prompt_basic, "").strip()
        
        # Detailed description
        with torch.no_grad():
            output_detailed = model.generate(
                **inputs,
                input_ids=input_ids_detailed,
                max_new_tokens=300,
                do_sample=False
            )
        detailed_description = tokenizer.decode(output_detailed[0], skip_special_tokens=True).replace(prompt_detailed, "").strip()
        
        # Technical analysis
        with torch.no_grad():
            output_technical = model.generate(
                **inputs,
                input_ids=input_ids_technical,
                max_new_tokens=200,
                do_sample=False
            )
        technical_analysis = tokenizer.decode(output_technical[0], skip_special_tokens=True).replace(prompt_technical, "").strip()
        
        # Return the results
        return json.dumps({
            "success": True,
            "basic_description": basic_description,
            "detailed_description": detailed_description,
            "technical_analysis": technical_analysis
        })
        
    except Exception as e:
        logger.error(f"Error processing image: {str(e)}", exc_info=True)
        return json.dumps({"error": f"Error generating description: {str(e)}"})
```

## Step 5: Register the Model

1. Create a model.yml file:

```yaml
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: qwen-vl-image-descriptor
version: 1
description: Qwen2-VL-7B model for image description
path: .
```

2. Register the model:

```bash
az ml model create --file model.yml \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg
```

## Step 6: Deploy as an Online Endpoint

1. Create an endpoint.yml file:

```yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: image-descriptor-endpoint
description: Endpoint for image description
auth_mode: key
```

2. Create a deployment.yml file:

```yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: qwen-vl-deployment
endpoint_name: image-descriptor-endpoint
model: azureml:qwen-vl-image-descriptor:1
environment:
  conda_file: environment.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.6-cudnn8-ubuntu20.04:latest
instance_type: Standard_NC6s_v3
instance_count: 1
request_settings:
  max_concurrent_requests_per_instance: 1
  request_timeout_ms: 120000
```

3. Create the endpoint:

```bash
az ml online-endpoint create --file endpoint.yml \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg
```

4. Create the deployment:

```bash
az ml online-deployment create --file deployment.yml \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg
```

5. Allocate 100% traffic to the deployment:

```bash
az ml online-endpoint update --name image-descriptor-endpoint \
    --traffic "qwen-vl-deployment=100" \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg
```

## Step 7: Test the Endpoint

You can test the endpoint using the Azure ML SDK:

```python
import json
import base64
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import ManagedOnlineEndpoint

# Get a handle to the workspace
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential=credential,
    subscription_id="your-subscription-id",
    resource_group_name="image-descriptor-rg",
    workspace_name="image-descriptor-ws"
)

# Get endpoint
endpoint = ml_client.online_endpoints.get("image-descriptor-endpoint")

# Load and encode the image
with open('data_temp/page_2.png', 'rb') as f:
    image_data = f.read()
image_b64 = base64.b64encode(image_data).decode('utf-8')

# Create the request payload
payload = {
    'image_data': image_b64
}

# Invoke the endpoint
response = ml_client.online_endpoints.invoke(
    endpoint_name="image-descriptor-endpoint",
    request_file=json.dumps(payload),
    deployment_name="qwen-vl-deployment"
)

# Parse the response
result = json.loads(response)
print(json.dumps(result, indent=2))
```

## Cost Optimization

To optimize costs:

1. Use a smaller compute size if possible
2. Scale to zero instances when not in use
3. Set up autoscaling rules
4. Consider reserved instances for long-term deployments

## Monitoring

Monitor your endpoint using:

1. Azure Monitor
2. Application Insights
3. Azure ML metrics dashboard
4. Set up alerts for anomalies

## Cleanup

To avoid ongoing charges, delete resources when not in use:

```bash
# Delete the endpoint
az ml online-endpoint delete --name image-descriptor-endpoint \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg -y

# Delete compute cluster
az ml compute delete --name gpu-cluster \
    --workspace-name image-descriptor-ws \
    --resource-group image-descriptor-rg -y

# Delete workspace (optional)
az ml workspace delete --name image-descriptor-ws \
    --resource-group image-descriptor-rg -y

# Delete resource group (optional)
az group delete --name image-descriptor-rg -y
```