SearchMap_Preview / README.md
vector-dev-god's picture
Update README.md
c81f597 verified
metadata
language: en
tags:
  - embedding
  - transformers
  - search
  - e-commerce
  - conversational-search
  - semantic-search
license: mit
pipeline_tag: feature-extraction

VectorPath SearchMap: Conversational E-commerce Search Embedding Model

Model Description

SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products.

Key Features

  • Optimized for conversational e-commerce queries
  • Handles complex, natural language search intents
  • Supports multi-attribute product search
  • Efficient 1024-dimensional embeddings (configurable up to 8192)
  • Specialized for product and hotel search scenarios

Quick Start

Try out the model in our interactive Colab Demo!

Model Details

  • Base Model: Stella Embed 400M v5
  • Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192)
  • Training Data: 100,000+ e-commerce products across 32 categories
  • License: MIT
  • Framework: PyTorch / Sentence Transformers

Usage

Using Sentence Transformers

# Install required packages
!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3

from sentence_transformers import SentenceTransformer

# Initialize the model
model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True)

# Encode queries
query = "A treat my dog and I can eat together"
query_embedding = model.encode(query)

# Encode products
product_description = "Organic peanut butter dog treats, safe for human consumption..."
product_embedding = model.encode(product_description)

Using with FAISS for Vector Search

import numpy as np
import faiss

# Create FAISS index
embedding_dimension = 1024  # or your chosen dimension
index = faiss.IndexFlatL2(embedding_dimension)

# Add product embeddings
product_embeddings = model.encode(product_descriptions, show_progress_bar=True)
index.add(np.array(product_embeddings).astype('float32'))

# Search
query_embedding = model.encode([query])
distances, indices = index.search(
    np.array(query_embedding).astype('float32'), 
    k=10
)

Example Search Queries

The model excels at understanding natural language queries like:

  • "A treat my dog and I can eat together"
  • "Lightweight waterproof hiking backpack for summer trails"
  • "Eco-friendly kitchen gadgets for a small apartment"
  • "Comfortable shoes for standing all day at work"
  • "Cereal for my 4 year old son that likes to miss breakfast"

Performance and Limitations

Evaluation

The model's evaluation metrics are available on the MTEB Leaderboard

  • The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size
  • The model also is No 1. by a far margin on the SemRel24STS task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages.
  • We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard

Strengths

  • Excellent at understanding conversational and natural language queries
  • Strong performance in e-commerce and hotel search scenarios
  • Handles complex multi-attribute queries
  • Efficient computation with configurable embedding dimensions

Current Limitations

  • May not fully prioritize weighted terms in queries
  • Limited handling of slang and colloquial language
  • Regional language variations might need fine-tuning

Training Details

The model was trained using:

  • Supervised learning with Sentence Transformers
  • 100,000+ product dataset across 32 categories
  • AI-generated conversational search queries
  • Positive and negative product examples for contrast learning

Intended Use

This model is designed for:

  • E-commerce product search and recommendations
  • Hotel and accommodation search
  • Product catalog vectorization
  • Semantic similarity matching
  • Query understanding and intent detection

Citation

If you use this model in your research, please cite:

@misc{vectorpath2025searchmap,
  title={SearchMap: Conversational E-commerce Search Embedding Model},
  author={VectorPath Research Team},
  year={2025},
  publisher={Hugging Face},
  journal={HuggingFace Model Hub},
}

Contact and Community

License

This model is released under the MIT License. See the LICENSE file for more details.