File size: 2,907 Bytes
2be27ab
 
 
 
 
 
 
 
 
 
 
1fd9ec2
 
ef7b203
1fd9ec2
ef7b203
1fd9ec2
ef7b203
1fd9ec2
ef7b203
1fd9ec2
ef7b203
 
 
 
1fd9ec2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef7b203
1fd9ec2
ef7b203
 
 
 
 
 
 
 
 
 
 
 
 
e2292e9
 
 
ef7b203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1fd9ec2
44f740c
1fd9ec2
44f740c
 
ef7b203
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: Amazon E-commerce Visual Assistant
emoji: 🛍️
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.28.0"
app_file: amazon_app.py
pinned: false
---

# Amazon E-commerce Visual Assistant

A multimodal AI assistant leveraging the Amazon Product Dataset 2020 to provide comprehensive product search and recommendations through natural language and image-based interactions[1].

## Project Overview

This conversational AI system combines advanced language and vision models to enhance e-commerce customer support, enabling accurate, context-aware responses to product-related queries[1].

## Project Structure

- `amazon_app.py`: Main Streamlit application
- `model.py`: Core AI model implementations
- `Vision_AI.ipynb`: EDA, Embedding Model, LLM
- `requirements.txt`: Project dependencies

## Setup and Installation

1. Clone the repository:
```bash
git clone https://github.com/wisdom196473/amazon-multimodal-product-assistant.git
cd amazon-multimodal-product-assistant
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
streamlit run amazon_app.py
```

## Technical Architecture

### Data Processing & Storage
- Standardized text fields and normalized numeric attributes
- Enhanced metadata indices for categories, price ranges, keywords, brands
- Validated image quality and managed duplicates
- Structured data storage in Parquet format[1]

### Model Components
- **Vision-Language Integration**: FashionCLIP for multimodal embedding generation
- **Vector Search**: FAISS with hybrid retrieval combining embedding similarity and metadata filtering
- **Language Model**: Mistral-7B with 4-bit quantization
- **RAG Framework**: Context-enhanced response generation[1]

### Performance Metrics

#### FahisonClip Embedding Model

- Recall@1: 0.6385
- Recall@10: 0.9008
- Precision@1: 0.6385
- NDCG@10: 0.7725[1]

## Implementation Details

### Core Features
- Text and image-based product search
- Product comparisons and recommendations
- Visual product recognition
- Detailed product information retrieval
- Price analysis and comparison[1]

### Technologies Used
- FashionCLIP for visual understanding
- Mistral-7B Language Model (4-bit quantized)
- FAISS for similarity search
- Google Vertex AI for vector storage
- Streamlit for user interface[1]

## Challenges & Solutions

### Technical Challenges Addressed
- Image processing with varying quality
- GPU memory optimization
- Efficient embedding storage
- Query response accuracy[1]

### Implemented Solutions
- Robust image validation pipeline
- 4-bit model quantization
- Optimized batch processing
- Enhanced metadata enrichment[1]

## Future Directions

- [ ] Fine-Tune FashionClip embedding model based on the specific domain data
- [ ] Fine-Tune large language model to improve its generalization capabilities
- [ ] Develop feedback loops for continuous improvement