Spaces:
Build error
Build error
metadata
title: Amazon E-commerce Visual Assistant
emoji: 🛍️
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: amazon_app.py
pinned: false
Amazon E-commerce Visual Assistant
A multimodal AI assistant leveraging the Amazon Product Dataset 2020 to provide comprehensive product search and recommendations through natural language and image-based interactions[1].
Project Overview
This conversational AI system combines advanced language and vision models to enhance e-commerce customer support, enabling accurate, context-aware responses to product-related queries[1].
Project Structure
amazon_app.py
: Main Streamlit applicationmodel.py
: Core AI model implementationsVision_AI.ipynb
: EDA, Embedding Model, LLMrequirements.txt
: Project dependencies
Setup and Installation
- Clone the repository:
git clone https://github.com/wisdom196473/amazon-multimodal-product-assistant.git
cd amazon-multimodal-product-assistant
- Install dependencies:
pip install -r requirements.txt
- Run the application:
streamlit run amazon_app.py
Technical Architecture
Data Processing & Storage
- Standardized text fields and normalized numeric attributes
- Enhanced metadata indices for categories, price ranges, keywords, brands
- Validated image quality and managed duplicates
- Structured data storage in Parquet format[1]
Model Components
- Vision-Language Integration: FashionCLIP for multimodal embedding generation
- Vector Search: FAISS with hybrid retrieval combining embedding similarity and metadata filtering
- Language Model: Mistral-7B with 4-bit quantization
- RAG Framework: Context-enhanced response generation[1]
Performance Metrics
FahisonClip Embedding Model
- Recall@1: 0.6385
- Recall@10: 0.9008
- Precision@1: 0.6385
- NDCG@10: 0.7725[1]
Implementation Details
Core Features
- Text and image-based product search
- Product comparisons and recommendations
- Visual product recognition
- Detailed product information retrieval
- Price analysis and comparison[1]
Technologies Used
- FashionCLIP for visual understanding
- Mistral-7B Language Model (4-bit quantized)
- FAISS for similarity search
- Google Vertex AI for vector storage
- Streamlit for user interface[1]
Challenges & Solutions
Technical Challenges Addressed
- Image processing with varying quality
- GPU memory optimization
- Efficient embedding storage
- Query response accuracy[1]
Implemented Solutions
- Robust image validation pipeline
- 4-bit model quantization
- Optimized batch processing
- Enhanced metadata enrichment[1]
Future Directions
- Fine-Tune FashionClip embedding model based on the specific domain data
- Fine-Tune large language model to improve its generalization capabilities
- Develop feedback loops for continuous improvement