File size: 5,198 Bytes
cddd354 69de17e cddd354 251acd1 cddd354 69de17e cddd354 69de17e cddd354 69de17e cddd354 69de17e cddd354 69de17e cddd354 69de17e cddd354 69de17e cddd354 251acd1 cddd354 8f53d22 c81f597 8f53d22 cddd354 69de17e cddd354 32bd079 cddd354 69de17e cddd354 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
language: en
tags:
- embedding
- transformers
- search
- e-commerce
- conversational-search
- semantic-search
license: mit
pipeline_tag: feature-extraction
---
# VectorPath SearchMap: Conversational E-commerce Search Embedding Model
## Model Description
SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products.
## Key Features
- Optimized for conversational e-commerce queries
- Handles complex, natural language search intents
- Supports multi-attribute product search
- Efficient 1024-dimensional embeddings (configurable up to 8192)
- Specialized for product and hotel search scenarios
## Quick Start
Try out the model in our interactive [Colab Demo](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)!
## Model Details
- Base Model: Stella Embed 400M v5
- Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192)
- Training Data: 100,000+ e-commerce products across 32 categories
- License: MIT
- Framework: PyTorch / Sentence Transformers
## Usage
### Using Sentence Transformers
```python
# Install required packages
!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3
from sentence_transformers import SentenceTransformer
# Initialize the model
model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True)
# Encode queries
query = "A treat my dog and I can eat together"
query_embedding = model.encode(query)
# Encode products
product_description = "Organic peanut butter dog treats, safe for human consumption..."
product_embedding = model.encode(product_description)
```
### Using with FAISS for Vector Search
```python
import numpy as np
import faiss
# Create FAISS index
embedding_dimension = 1024 # or your chosen dimension
index = faiss.IndexFlatL2(embedding_dimension)
# Add product embeddings
product_embeddings = model.encode(product_descriptions, show_progress_bar=True)
index.add(np.array(product_embeddings).astype('float32'))
# Search
query_embedding = model.encode([query])
distances, indices = index.search(
np.array(query_embedding).astype('float32'),
k=10
)
```
### Example Search Queries
The model excels at understanding natural language queries like:
- "A treat my dog and I can eat together"
- "Lightweight waterproof hiking backpack for summer trails"
- "Eco-friendly kitchen gadgets for a small apartment"
- "Comfortable shoes for standing all day at work"
- "Cereal for my 4 year old son that likes to miss breakfast"
## Performance and Limitations
### Evaluation
The model's evaluation metrics are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
- The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size
- The model also is No 1. by a far margin on the [SemRel24STS](https://huggingface.co/datasets/SemRel/SemRel2024) task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages.
- We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard
### Strengths
- Excellent at understanding conversational and natural language queries
- Strong performance in e-commerce and hotel search scenarios
- Handles complex multi-attribute queries
- Efficient computation with configurable embedding dimensions
### Current Limitations
- May not fully prioritize weighted terms in queries
- Limited handling of slang and colloquial language
- Regional language variations might need fine-tuning
## Training Details
The model was trained using:
- Supervised learning with Sentence Transformers
- 100,000+ product dataset across 32 categories
- AI-generated conversational search queries
- Positive and negative product examples for contrast learning
## Intended Use
This model is designed for:
- E-commerce product search and recommendations
- Hotel and accommodation search
- Product catalog vectorization
- Semantic similarity matching
- Query understanding and intent detection
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{vectorpath2025searchmap,
title={SearchMap: Conversational E-commerce Search Embedding Model},
author={VectorPath Research Team},
year={2025},
publisher={Hugging Face},
journal={HuggingFace Model Hub},
}
```
## Contact and Community
- Discord Community: [Join our Discord](https://discord.gg/gXvVfqGD)
- GitHub Issues: Report bugs and feature requests
- Interactive Demo: [Try it on Colab](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)
## License
This model is released under the MIT License. See the LICENSE file for more details. |