|
--- |
|
language: en |
|
tags: |
|
- embedding |
|
- transformers |
|
- search |
|
- e-commerce |
|
- conversational-search |
|
- semantic-search |
|
license: mit |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
# VectorPath SearchMap: Conversational E-commerce Search Embedding Model |
|
|
|
## Model Description |
|
|
|
SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products. |
|
|
|
## Key Features |
|
|
|
- Optimized for conversational e-commerce queries |
|
- Handles complex, natural language search intents |
|
- Supports multi-attribute product search |
|
- Efficient 1024-dimensional embeddings (configurable up to 8192) |
|
- Specialized for product and hotel search scenarios |
|
|
|
## Quick Start |
|
|
|
Try out the model in our interactive [Colab Demo](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)! |
|
|
|
## Model Details |
|
|
|
- Base Model: Stella Embed 400M v5 |
|
- Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192) |
|
- Training Data: 100,000+ e-commerce products across 32 categories |
|
- License: MIT |
|
- Framework: PyTorch / Sentence Transformers |
|
|
|
## Usage |
|
|
|
### Using Sentence Transformers |
|
|
|
```python |
|
# Install required packages |
|
!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3 |
|
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Initialize the model |
|
model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True) |
|
|
|
# Encode queries |
|
query = "A treat my dog and I can eat together" |
|
query_embedding = model.encode(query) |
|
|
|
# Encode products |
|
product_description = "Organic peanut butter dog treats, safe for human consumption..." |
|
product_embedding = model.encode(product_description) |
|
``` |
|
|
|
### Using with FAISS for Vector Search |
|
|
|
```python |
|
import numpy as np |
|
import faiss |
|
|
|
# Create FAISS index |
|
embedding_dimension = 1024 # or your chosen dimension |
|
index = faiss.IndexFlatL2(embedding_dimension) |
|
|
|
# Add product embeddings |
|
product_embeddings = model.encode(product_descriptions, show_progress_bar=True) |
|
index.add(np.array(product_embeddings).astype('float32')) |
|
|
|
# Search |
|
query_embedding = model.encode([query]) |
|
distances, indices = index.search( |
|
np.array(query_embedding).astype('float32'), |
|
k=10 |
|
) |
|
``` |
|
|
|
### Example Search Queries |
|
|
|
The model excels at understanding natural language queries like: |
|
- "A treat my dog and I can eat together" |
|
- "Lightweight waterproof hiking backpack for summer trails" |
|
- "Eco-friendly kitchen gadgets for a small apartment" |
|
- "Comfortable shoes for standing all day at work" |
|
- "Cereal for my 4 year old son that likes to miss breakfast" |
|
|
|
## Performance and Limitations |
|
|
|
### Evaluation |
|
The model's evaluation metrics are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) |
|
- The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size |
|
- The model also is No 1. by a far margin on the [SemRel24STS](https://huggingface.co/datasets/SemRel/SemRel2024) task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages. |
|
- We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard |
|
|
|
|
|
### Strengths |
|
- Excellent at understanding conversational and natural language queries |
|
- Strong performance in e-commerce and hotel search scenarios |
|
- Handles complex multi-attribute queries |
|
- Efficient computation with configurable embedding dimensions |
|
|
|
### Current Limitations |
|
- May not fully prioritize weighted terms in queries |
|
- Limited handling of slang and colloquial language |
|
- Regional language variations might need fine-tuning |
|
|
|
## Training Details |
|
|
|
The model was trained using: |
|
- Supervised learning with Sentence Transformers |
|
- 100,000+ product dataset across 32 categories |
|
- AI-generated conversational search queries |
|
- Positive and negative product examples for contrast learning |
|
|
|
## Intended Use |
|
|
|
This model is designed for: |
|
- E-commerce product search and recommendations |
|
- Hotel and accommodation search |
|
- Product catalog vectorization |
|
- Semantic similarity matching |
|
- Query understanding and intent detection |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{vectorpath2025searchmap, |
|
title={SearchMap: Conversational E-commerce Search Embedding Model}, |
|
author={VectorPath Research Team}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
journal={HuggingFace Model Hub}, |
|
} |
|
``` |
|
|
|
## Contact and Community |
|
|
|
- Discord Community: [Join our Discord](https://discord.gg/gXvVfqGD) |
|
- GitHub Issues: Report bugs and feature requests |
|
- Interactive Demo: [Try it on Colab](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing) |
|
|
|
## License |
|
|
|
This model is released under the MIT License. See the LICENSE file for more details. |