metadata
language: en
tags:
- embedding
- transformers
- search
- e-commerce
- conversational-search
- semantic-search
license: mit
pipeline_tag: feature-extraction
VectorPath SearchMap: Conversational E-commerce Search Embedding Model
Model Description
SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products.
Key Features
- Optimized for conversational e-commerce queries
- Handles complex, natural language search intents
- Supports multi-attribute product search
- Efficient 1024-dimensional embeddings (configurable up to 8192)
- Specialized for product and hotel search scenarios
Quick Start
Try out the model in our interactive Colab Demo!
Model Details
- Base Model: Stella Embed 400M v5
- Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192)
- Training Data: 100,000+ e-commerce products across 32 categories
- License: MIT
- Framework: PyTorch / Sentence Transformers
Usage
Using Sentence Transformers
# Install required packages
!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3
from sentence_transformers import SentenceTransformer
# Initialize the model
model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True)
# Encode queries
query = "A treat my dog and I can eat together"
query_embedding = model.encode(query)
# Encode products
product_description = "Organic peanut butter dog treats, safe for human consumption..."
product_embedding = model.encode(product_description)
Using with FAISS for Vector Search
import numpy as np
import faiss
# Create FAISS index
embedding_dimension = 1024 # or your chosen dimension
index = faiss.IndexFlatL2(embedding_dimension)
# Add product embeddings
product_embeddings = model.encode(product_descriptions, show_progress_bar=True)
index.add(np.array(product_embeddings).astype('float32'))
# Search
query_embedding = model.encode([query])
distances, indices = index.search(
np.array(query_embedding).astype('float32'),
k=10
)
Example Search Queries
The model excels at understanding natural language queries like:
- "A treat my dog and I can eat together"
- "Lightweight waterproof hiking backpack for summer trails"
- "Eco-friendly kitchen gadgets for a small apartment"
- "Comfortable shoes for standing all day at work"
- "Cereal for my 4 year old son that likes to miss breakfast"
Performance and Limitations
Evaluation
The model's evaluation metrics are available on the MTEB Leaderboard
- The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size
- The model also is No 1. by a far margin on the SemRel24STS task with an accuracy of 81.12 beating Google Gemini embedding model (second place) 73.14. SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages.
- We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard
Strengths
- Excellent at understanding conversational and natural language queries
- Strong performance in e-commerce and hotel search scenarios
- Handles complex multi-attribute queries
- Efficient computation with configurable embedding dimensions
Current Limitations
- May not fully prioritize weighted terms in queries
- Limited handling of slang and colloquial language
- Regional language variations might need fine-tuning
Training Details
The model was trained using:
- Supervised learning with Sentence Transformers
- 100,000+ product dataset across 32 categories
- AI-generated conversational search queries
- Positive and negative product examples for contrast learning
Intended Use
This model is designed for:
- E-commerce product search and recommendations
- Hotel and accommodation search
- Product catalog vectorization
- Semantic similarity matching
- Query understanding and intent detection
Citation
If you use this model in your research, please cite:
@misc{vectorpath2025searchmap,
title={SearchMap: Conversational E-commerce Search Embedding Model},
author={VectorPath Research Team},
year={2025},
publisher={Hugging Face},
journal={HuggingFace Model Hub},
}
Contact and Community
- Discord Community: Join our Discord
- GitHub Issues: Report bugs and feature requests
- Interactive Demo: Try it on Colab
License
This model is released under the MIT License. See the LICENSE file for more details.