metadata

language: en
tags:
  - embedding
  - transformers
  - search
  - e-commerce
  - conversational-search
  - semantic-search
license: mit
pipeline_tag: feature-extraction

VectorPath SearchMap: Conversational E-commerce Search Embedding Model

Model Description

SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products.

Key Features

Optimized for conversational e-commerce queries
Handles complex, natural language search intents
Supports multi-attribute product search
Efficient 1024-dimensional embeddings (configurable up to 8192)
Specialized for product and hotel search scenarios

Quick Start

Try out the model in our interactive Colab Demo!

Model Details

Base Model: Stella Embed 400M v5
Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192)
Training Data: 100,000+ e-commerce products across 32 categories
License: MIT
Framework: PyTorch / Sentence Transformers

Usage

Using Sentence Transformers

# Install required packages
!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3

from sentence_transformers import SentenceTransformer

# Initialize the model
model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True)

# Encode queries
query = "A treat my dog and I can eat together"
query_embedding = model.encode(query)

# Encode products
product_description = "Organic peanut butter dog treats, safe for human consumption..."
product_embedding = model.encode(product_description)

Using with FAISS for Vector Search

import numpy as np
import faiss

# Create FAISS index
embedding_dimension = 1024  # or your chosen dimension
index = faiss.IndexFlatL2(embedding_dimension)

# Add product embeddings
product_embeddings = model.encode(product_descriptions, show_progress_bar=True)
index.add(np.array(product_embeddings).astype('float32'))

# Search
query_embedding = model.encode([query])
distances, indices = index.search(
    np.array(query_embedding).astype('float32'), 
    k=10
)

Example Search Queries

The model excels at understanding natural language queries like:

"A treat my dog and I can eat together"
"Lightweight waterproof hiking backpack for summer trails"
"Eco-friendly kitchen gadgets for a small apartment"
"Comfortable shoes for standing all day at work"
"Cereal for my 4 year old son that likes to miss breakfast"

Performance and Limitations

Evaluation

The model's evaluation metrics are available on the MTEB Leaderboard

The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size
The model also is No 1. by a far margin on the SemRel24STS task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages.
We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard

Strengths

Excellent at understanding conversational and natural language queries
Strong performance in e-commerce and hotel search scenarios
Handles complex multi-attribute queries
Efficient computation with configurable embedding dimensions

Current Limitations

May not fully prioritize weighted terms in queries
Limited handling of slang and colloquial language
Regional language variations might need fine-tuning

Training Details

The model was trained using:

Supervised learning with Sentence Transformers
100,000+ product dataset across 32 categories
AI-generated conversational search queries
Positive and negative product examples for contrast learning

Intended Use

This model is designed for:

E-commerce product search and recommendations
Hotel and accommodation search
Product catalog vectorization
Semantic similarity matching
Query understanding and intent detection

Citation

If you use this model in your research, please cite:

@misc{vectorpath2025searchmap,
  title={SearchMap: Conversational E-commerce Search Embedding Model},
  author={VectorPath Research Team},
  year={2025},
  publisher={Hugging Face},
  journal={HuggingFace Model Hub},
}

Contact and Community

Discord Community: Join our Discord
GitHub Issues: Report bugs and feature requests
Interactive Demo: Try it on Colab

License

This model is released under the MIT License. See the LICENSE file for more details.