Spaces:
Build error
Build error
Wisdom Chen
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -11,11 +11,11 @@ pinned: false
|
|
11 |
|
12 |
# Amazon E-commerce Visual Assistant
|
13 |
|
14 |
-
A multimodal AI assistant leveraging the Amazon Product Dataset 2020 to provide comprehensive product search and recommendations through natural language and image-based interactions
|
15 |
|
16 |
## Project Overview
|
17 |
|
18 |
-
This conversational AI system combines advanced language and vision models to enhance e-commerce customer support, enabling accurate, context-aware responses to product-related queries
|
19 |
|
20 |
## Project Structure
|
21 |
|
@@ -48,13 +48,13 @@ streamlit run amazon_app.py
|
|
48 |
- Standardized text fields and normalized numeric attributes
|
49 |
- Enhanced metadata indices for categories, price ranges, keywords, brands
|
50 |
- Validated image quality and managed duplicates
|
51 |
-
- Structured data storage in Parquet format
|
52 |
|
53 |
### Model Components
|
54 |
- **Vision-Language Integration**: FashionCLIP for multimodal embedding generation
|
55 |
- **Vector Search**: FAISS with hybrid retrieval combining embedding similarity and metadata filtering
|
56 |
- **Language Model**: Mistral-7B with 4-bit quantization
|
57 |
-
- **RAG Framework**: Context-enhanced response generation
|
58 |
|
59 |
### Performance Metrics
|
60 |
|
@@ -63,7 +63,7 @@ streamlit run amazon_app.py
|
|
63 |
- Recall@1: 0.6385
|
64 |
- Recall@10: 0.9008
|
65 |
- Precision@1: 0.6385
|
66 |
-
- NDCG@10: 0.7725
|
67 |
|
68 |
## Implementation Details
|
69 |
|
@@ -72,14 +72,14 @@ streamlit run amazon_app.py
|
|
72 |
- Product comparisons and recommendations
|
73 |
- Visual product recognition
|
74 |
- Detailed product information retrieval
|
75 |
-
- Price analysis and comparison
|
76 |
|
77 |
### Technologies Used
|
78 |
- FashionCLIP for visual understanding
|
79 |
- Mistral-7B Language Model (4-bit quantized)
|
80 |
- FAISS for similarity search
|
81 |
- Google Vertex AI for vector storage
|
82 |
-
- Streamlit for user interface
|
83 |
|
84 |
## Challenges & Solutions
|
85 |
|
@@ -87,16 +87,16 @@ streamlit run amazon_app.py
|
|
87 |
- Image processing with varying quality
|
88 |
- GPU memory optimization
|
89 |
- Efficient embedding storage
|
90 |
-
- Query response accuracy
|
91 |
|
92 |
### Implemented Solutions
|
93 |
- Robust image validation pipeline
|
94 |
- 4-bit model quantization
|
95 |
- Optimized batch processing
|
96 |
-
- Enhanced metadata enrichment
|
97 |
|
98 |
## Future Directions
|
99 |
|
100 |
- [ ] Fine-Tune FashionClip embedding model based on the specific domain data
|
101 |
- [ ] Fine-Tune large language model to improve its generalization capabilities
|
102 |
-
- [ ] Develop feedback loops for continuous improvement
|
|
|
11 |
|
12 |
# Amazon E-commerce Visual Assistant
|
13 |
|
14 |
+
A multimodal AI assistant leveraging the Amazon Product Dataset 2020 to provide comprehensive product search and recommendations through natural language and image-based interactions.
|
15 |
|
16 |
## Project Overview
|
17 |
|
18 |
+
This conversational AI system combines advanced language and vision models to enhance e-commerce customer support, enabling accurate, context-aware responses to product-related queries.
|
19 |
|
20 |
## Project Structure
|
21 |
|
|
|
48 |
- Standardized text fields and normalized numeric attributes
|
49 |
- Enhanced metadata indices for categories, price ranges, keywords, brands
|
50 |
- Validated image quality and managed duplicates
|
51 |
+
- Structured data storage in Parquet format
|
52 |
|
53 |
### Model Components
|
54 |
- **Vision-Language Integration**: FashionCLIP for multimodal embedding generation
|
55 |
- **Vector Search**: FAISS with hybrid retrieval combining embedding similarity and metadata filtering
|
56 |
- **Language Model**: Mistral-7B with 4-bit quantization
|
57 |
+
- **RAG Framework**: Context-enhanced response generation
|
58 |
|
59 |
### Performance Metrics
|
60 |
|
|
|
63 |
- Recall@1: 0.6385
|
64 |
- Recall@10: 0.9008
|
65 |
- Precision@1: 0.6385
|
66 |
+
- NDCG@10: 0.7725
|
67 |
|
68 |
## Implementation Details
|
69 |
|
|
|
72 |
- Product comparisons and recommendations
|
73 |
- Visual product recognition
|
74 |
- Detailed product information retrieval
|
75 |
+
- Price analysis and comparison
|
76 |
|
77 |
### Technologies Used
|
78 |
- FashionCLIP for visual understanding
|
79 |
- Mistral-7B Language Model (4-bit quantized)
|
80 |
- FAISS for similarity search
|
81 |
- Google Vertex AI for vector storage
|
82 |
+
- Streamlit for user interface
|
83 |
|
84 |
## Challenges & Solutions
|
85 |
|
|
|
87 |
- Image processing with varying quality
|
88 |
- GPU memory optimization
|
89 |
- Efficient embedding storage
|
90 |
+
- Query response accuracy
|
91 |
|
92 |
### Implemented Solutions
|
93 |
- Robust image validation pipeline
|
94 |
- 4-bit model quantization
|
95 |
- Optimized batch processing
|
96 |
+
- Enhanced metadata enrichment
|
97 |
|
98 |
## Future Directions
|
99 |
|
100 |
- [ ] Fine-Tune FashionClip embedding model based on the specific domain data
|
101 |
- [ ] Fine-Tune large language model to improve its generalization capabilities
|
102 |
+
- [ ] Develop feedback loops for continuous improvement
|