Update README.md
Browse files
README.md
CHANGED
@@ -4,19 +4,21 @@ emoji: π¨
|
|
4 |
colorFrom: blue
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 5.
|
8 |
app_file: app.py
|
9 |
pinned: false
|
|
|
|
|
10 |
---
|
11 |
|
12 |
# Marketing Image Generator with Agent Review
|
13 |
|
14 |
-
A sophisticated AI-powered image generation system that creates high-quality marketing images with automated quality review and refinement. Built on modern AI technologies including Google's
|
15 |
|
16 |
## Features
|
17 |
|
18 |
- **AI-Powered Image Generation**: Create stunning marketing images from text prompts using Google's Imagen4 via MCP server
|
19 |
-
- **Automated Quality Review**: Intelligent Gemini agent
|
20 |
- **Marketing-Focused**: Optimized for marketing materials, social media, and promotional content
|
21 |
- **Real-time Feedback**: Get instant quality scores and improvement suggestions
|
22 |
- **Professional Workflow**: Streamlined process from concept to final image
|
@@ -56,11 +58,11 @@ A sophisticated AI-powered image generation system that creates high-quality mar
|
|
56 |
|
57 |
### Core Components
|
58 |
|
59 |
-
- **Agent 1 (Image Generator)**: Creates images using Google's
|
60 |
- **Agent 2 (Marketing Reviewer)**: Analyzes image quality and provides marketing-focused feedback using Gemini Vision
|
61 |
- **Orchestrator**: Manages workflow between agents and handles handover
|
62 |
- **Web Interface**: Gradio-based user interface optimized for Hugging Face
|
63 |
-
- **MCP Server Integration**: Model Context Protocol for seamless
|
64 |
|
65 |
### System Architecture and Workflow
|
66 |
|
@@ -73,18 +75,18 @@ A sophisticated AI-powered image generation system that creates high-quality mar
|
|
73 |
βReviewer βββββΆβ βββββΆβ Agent 2 (Gemini) Marketing β
|
74 |
βPrompt β β β β Reviewer β
|
75 |
β β β β β β
|
76 |
-
β β β β β
|
77 |
-
β β β β β β
|
78 |
-
β β β β β β
|
79 |
-
β β β β β β Draft Image Creation
|
80 |
-
β β β β β
|
81 |
β β β β β β
|
82 |
-
β β β β β
|
83 |
-
β β β β β β
|
84 |
-
β β β β β β & Changes Suggested
|
85 |
-
β β β β β
|
86 |
β β β β β β
|
87 |
-
β Image ββββββ ββββββ Final Image Response
|
88 |
β Response β β β β β
|
89 |
βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββββββ
|
90 |
```
|
@@ -104,12 +106,12 @@ A sophisticated AI-powered image generation system that creates high-quality mar
|
|
104 |
|
105 |
3. **Image Generation and Drafting (Top Right)**:
|
106 |
- **Agent 1 (Gemini) Drafter**: Receives Image Prompt, orchestrates image generation
|
107 |
-
- **
|
108 |
|
109 |
4. **Marketing Review and Refinement (Bottom Right)**:
|
110 |
- **Agent 2 (Gemini) Marketing Reviewer**: Receives Reviewer Prompt, evaluates generated image against marketing criteria
|
111 |
- **Draft Image Reviewed and Changes Suggested**: Agent 2's review process output
|
112 |
-
- **Iterative Refinement Loop**: Bidirectional feedback between Agent 2 and
|
113 |
- Final **Image Response** sent back to Gradio UI
|
114 |
|
115 |
### Summary of Flow:
|
@@ -117,12 +119,12 @@ User provides prompts β Gradio UI β Agent 1 drafts image with Imagen4 β Ag
|
|
117 |
|
118 |
### Technology Stack
|
119 |
|
120 |
-
- **AI Models**: Google Imagen4 (via MCP), Gemini Vision
|
121 |
- **Framework**: Gradio (Web Interface)
|
122 |
- **Orchestration**: Custom agent handover system
|
123 |
- **Deployment**: Hugging Face Spaces
|
124 |
- **Authentication**: Google Cloud API Keys
|
125 |
-
- **Protocol**: MCP (Model Context Protocol) for
|
126 |
|
127 |
### Why A2A Was Not Applied
|
128 |
|
@@ -179,7 +181,7 @@ quality_score = result["data"]["review"]["quality_score"]
|
|
179 |
- **Quality Threshold**: Minimum quality score for auto-approval
|
180 |
- **Max Iterations**: Maximum refinement attempts
|
181 |
- **Review Settings**: Customize review criteria
|
182 |
-
- **MCP Configuration**:
|
183 |
|
184 |
## Development
|
185 |
|
@@ -268,12 +270,52 @@ Access monitoring dashboards:
|
|
268 |
1. **API Key Errors**: Ensure your Google API keys are valid and configured as HF secrets
|
269 |
2. **Image Generation Fails**: Check your internet connection and API quotas
|
270 |
3. **Review Not Working**: Verify the Gemini agent is running and configured correctly
|
271 |
-
4. **MCP Connection Issues**: Check
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
272 |
|
273 |
### Debug Mode
|
274 |
|
275 |
Enable debug logging by setting `LOG_LEVEL=DEBUG` in your environment variables.
|
276 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
277 |
### Support
|
278 |
|
279 |
For issues and questions:
|
@@ -287,7 +329,7 @@ This project is licensed under the MIT License - see the LICENSE file for detail
|
|
287 |
|
288 |
## Acknowledgments
|
289 |
|
290 |
-
- Google AI for Imagen4 and Gemini technologies
|
291 |
- Hugging Face for the deployment platform
|
292 |
- Gradio for the web interface framework
|
293 |
-
- The open-source community for various dependencies
|
|
|
4 |
colorFrom: blue
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 5.38.2
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
license: mit
|
11 |
+
short_description: AI marketing image generator with GCP Imagen4 + Gemini 2.5
|
12 |
---
|
13 |
|
14 |
# Marketing Image Generator with Agent Review
|
15 |
|
16 |
+
A sophisticated AI-powered image generation system that creates high-quality marketing images with automated quality review and refinement. Built on modern AI technologies including Google's Imagen4 and Gemini 2.5 Pro with advanced agent orchestration.
|
17 |
|
18 |
## Features
|
19 |
|
20 |
- **AI-Powered Image Generation**: Create stunning marketing images from text prompts using Google's Imagen4 via MCP server
|
21 |
+
- **Automated Quality Review**: Intelligent Gemini agent automatically reviews and refines generated images
|
22 |
- **Marketing-Focused**: Optimized for marketing materials, social media, and promotional content
|
23 |
- **Real-time Feedback**: Get instant quality scores and improvement suggestions
|
24 |
- **Professional Workflow**: Streamlined process from concept to final image
|
|
|
58 |
|
59 |
### Core Components
|
60 |
|
61 |
+
- **Agent 1 (Image Generator)**: Creates images using Google's Imagen4 via MCP server integration
|
62 |
- **Agent 2 (Marketing Reviewer)**: Analyzes image quality and provides marketing-focused feedback using Gemini Vision
|
63 |
- **Orchestrator**: Manages workflow between agents and handles handover
|
64 |
- **Web Interface**: Gradio-based user interface optimized for Hugging Face
|
65 |
+
- **MCP Server Integration**: Model Context Protocol for seamless Imagen4 access
|
66 |
|
67 |
### System Architecture and Workflow
|
68 |
|
|
|
75 |
βReviewer βββββΆβ βββββΆβ Agent 2 (Gemini) Marketing β
|
76 |
βPrompt β β β β Reviewer β
|
77 |
β β β β β β
|
78 |
+
β β β β β βββββββββββββββββββββββββββ β
|
79 |
+
β β β β β β Imagen4 (via MCP) β β
|
80 |
+
β β β β β β β β
|
81 |
+
β β β β β β Draft Image Creation β β
|
82 |
+
β β β β β βββββββββββββββββββββββββββ β
|
83 |
β β β β β β
|
84 |
+
β β β β β βββββββββββββββββββββββββββ β
|
85 |
+
β β β β β β Draft Image Reviewed β β
|
86 |
+
β β β β β β & Changes Suggested β β
|
87 |
+
β β β β β βββββββββββββββββββββββββββ β
|
88 |
β β β β β β
|
89 |
+
β Image ββββββ ββββββ Final Image Response β
|
90 |
β Response β β β β β
|
91 |
βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββββββ
|
92 |
```
|
|
|
106 |
|
107 |
3. **Image Generation and Drafting (Top Right)**:
|
108 |
- **Agent 1 (Gemini) Drafter**: Receives Image Prompt, orchestrates image generation
|
109 |
+
- **Imagen4 (via MCP)**: Agent 1 interacts with Imagen4 through MCP server to create initial image draft
|
110 |
|
111 |
4. **Marketing Review and Refinement (Bottom Right)**:
|
112 |
- **Agent 2 (Gemini) Marketing Reviewer**: Receives Reviewer Prompt, evaluates generated image against marketing criteria
|
113 |
- **Draft Image Reviewed and Changes Suggested**: Agent 2's review process output
|
114 |
+
- **Iterative Refinement Loop**: Bidirectional feedback between Agent 2 and Imagen4 (via Agent 1) to refine image until it meets marketing standards
|
115 |
- Final **Image Response** sent back to Gradio UI
|
116 |
|
117 |
### Summary of Flow:
|
|
|
119 |
|
120 |
### Technology Stack
|
121 |
|
122 |
+
- **AI Models**: Google Imagen4 (via MCP), Gemini 2.5 Pro Vision
|
123 |
- **Framework**: Gradio (Web Interface)
|
124 |
- **Orchestration**: Custom agent handover system
|
125 |
- **Deployment**: Hugging Face Spaces
|
126 |
- **Authentication**: Google Cloud API Keys
|
127 |
+
- **Protocol**: MCP (Model Context Protocol) for Imagen4 integration
|
128 |
|
129 |
### Why A2A Was Not Applied
|
130 |
|
|
|
181 |
- **Quality Threshold**: Minimum quality score for auto-approval
|
182 |
- **Max Iterations**: Maximum refinement attempts
|
183 |
- **Review Settings**: Customize review criteria
|
184 |
+
- **MCP Configuration**: Imagen4 server settings
|
185 |
|
186 |
## Development
|
187 |
|
|
|
270 |
1. **API Key Errors**: Ensure your Google API keys are valid and configured as HF secrets
|
271 |
2. **Image Generation Fails**: Check your internet connection and API quotas
|
272 |
3. **Review Not Working**: Verify the Gemini agent is running and configured correctly
|
273 |
+
4. **MCP Connection Issues**: Check Imagen4 server connectivity and configuration
|
274 |
+
|
275 |
+
### Content Policy & Brand Restrictions
|
276 |
+
|
277 |
+
Google's AI models have built-in safety guardrails that may cause timeouts or rejections for certain content types:
|
278 |
+
|
279 |
+
#### π« **Highly Restricted Content** (Likely to cause stalls/timeouts):
|
280 |
+
- **Political Figures**: Named world leaders, politicians (e.g., "Putin", "Zelensky", "Biden")
|
281 |
+
- **Political Buildings**: Government buildings like "10 Downing Street", "White House"
|
282 |
+
- **Geopolitical Content**: War, conflict, or sensitive international relations
|
283 |
+
- **Financial Institution Brands**: Major banks like "HSBC", "Bank of America", "JPMorgan"
|
284 |
+
|
285 |
+
#### β οΈ **Moderately Restricted Content** (May cause delays):
|
286 |
+
- **Regulated Industries**: Healthcare, pharmaceutical, financial services
|
287 |
+
- **Some Corporate Brands**: Varies by sector and brand sensitivity
|
288 |
+
|
289 |
+
#### β
**Generally Permitted Content**:
|
290 |
+
- **Technology Brands**: "Cognizant", "Microsoft", "IBM", "Accenture"
|
291 |
+
- **Generic Business**: "Professional office", "corporate environment"
|
292 |
+
- **Non-branded Content**: Generic descriptions without specific brand names
|
293 |
+
|
294 |
+
#### π§ **Workarounds for Restricted Content**:
|
295 |
+
|
296 |
+
**Instead of**: `"Professional boardroom with HSBC signage"`
|
297 |
+
**Use**: `"Professional boardroom with international banking corporation signage in red and white colors"`
|
298 |
+
|
299 |
+
**Instead of**: `"Meeting with political leaders"`
|
300 |
+
**Use**: `"Meeting with business executives in government-style building"`
|
301 |
+
|
302 |
+
**Strategy**: Move brand-specific requirements to **Review Guidelines** instead of the main prompt:
|
303 |
+
- **Main Prompt**: `"Professional corporate environment"`
|
304 |
+
- **Review Guidelines**: `"Ensure branding reflects HSBC corporate colors (red and white)"`
|
305 |
+
|
306 |
+
This approach bypasses content filters while still providing guidance for review.
|
307 |
|
308 |
### Debug Mode
|
309 |
|
310 |
Enable debug logging by setting `LOG_LEVEL=DEBUG` in your environment variables.
|
311 |
|
312 |
+
### Content Policy Testing
|
313 |
+
|
314 |
+
Use the included diagnostic scripts to test content restrictions:
|
315 |
+
- `debug_hsbc_prompt.py` - Test financial brand restrictions
|
316 |
+
- `test_cognizant_brand.py` - Test tech brand accessibility
|
317 |
+
- `test_brand_workaround.py` - Test workaround strategies
|
318 |
+
|
319 |
### Support
|
320 |
|
321 |
For issues and questions:
|
|
|
329 |
|
330 |
## Acknowledgments
|
331 |
|
332 |
+
- Google AI for Imagen4 and Gemini 2.5 Pro technologies
|
333 |
- Hugging Face for the deployment platform
|
334 |
- Gradio for the web interface framework
|
335 |
+
- The open-source community for various dependencies
|