File size: 5,198 Bytes
cddd354
 
 
 
 
 
 
 
 
 
 
 
 
69de17e
cddd354
 
 
251acd1
cddd354
 
 
 
 
 
 
 
 
69de17e
 
 
 
cddd354
 
 
 
 
 
 
 
 
 
69de17e
cddd354
 
69de17e
 
 
 
cddd354
 
69de17e
cddd354
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69de17e
cddd354
 
 
69de17e
cddd354
69de17e
cddd354
 
 
 
 
 
 
 
 
 
 
251acd1
cddd354
 
 
8f53d22
 
 
c81f597
8f53d22
 
 
cddd354
 
 
 
 
 
 
 
 
 
 
 
 
 
69de17e
cddd354
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32bd079
cddd354
69de17e
cddd354
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
language: en
tags:
- embedding
- transformers
- search
- e-commerce
- conversational-search
- semantic-search
license: mit
pipeline_tag: feature-extraction
---

# VectorPath SearchMap: Conversational E-commerce Search Embedding Model

## Model Description

SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products.

## Key Features

- Optimized for conversational e-commerce queries
- Handles complex, natural language search intents
- Supports multi-attribute product search
- Efficient 1024-dimensional embeddings (configurable up to 8192)
- Specialized for product and hotel search scenarios

## Quick Start

Try out the model in our interactive [Colab Demo](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)!

## Model Details

- Base Model: Stella Embed 400M v5
- Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192)
- Training Data: 100,000+ e-commerce products across 32 categories
- License: MIT
- Framework: PyTorch / Sentence Transformers

## Usage

### Using Sentence Transformers

```python
# Install required packages
!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3

from sentence_transformers import SentenceTransformer

# Initialize the model
model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True)

# Encode queries
query = "A treat my dog and I can eat together"
query_embedding = model.encode(query)

# Encode products
product_description = "Organic peanut butter dog treats, safe for human consumption..."
product_embedding = model.encode(product_description)
```

### Using with FAISS for Vector Search

```python
import numpy as np
import faiss

# Create FAISS index
embedding_dimension = 1024  # or your chosen dimension
index = faiss.IndexFlatL2(embedding_dimension)

# Add product embeddings
product_embeddings = model.encode(product_descriptions, show_progress_bar=True)
index.add(np.array(product_embeddings).astype('float32'))

# Search
query_embedding = model.encode([query])
distances, indices = index.search(
    np.array(query_embedding).astype('float32'), 
    k=10
)
```

### Example Search Queries

The model excels at understanding natural language queries like:
- "A treat my dog and I can eat together"
- "Lightweight waterproof hiking backpack for summer trails"
- "Eco-friendly kitchen gadgets for a small apartment"
- "Comfortable shoes for standing all day at work"
- "Cereal for my 4 year old son that likes to miss breakfast"

## Performance and Limitations

### Evaluation
The model's evaluation metrics are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
- The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size
- The model also is No 1. by a far margin on the [SemRel24STS](https://huggingface.co/datasets/SemRel/SemRel2024) task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages.
- We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard


### Strengths
- Excellent at understanding conversational and natural language queries
- Strong performance in e-commerce and hotel search scenarios
- Handles complex multi-attribute queries
- Efficient computation with configurable embedding dimensions

### Current Limitations
- May not fully prioritize weighted terms in queries
- Limited handling of slang and colloquial language
- Regional language variations might need fine-tuning

## Training Details

The model was trained using:
- Supervised learning with Sentence Transformers
- 100,000+ product dataset across 32 categories
- AI-generated conversational search queries
- Positive and negative product examples for contrast learning

## Intended Use

This model is designed for:
- E-commerce product search and recommendations
- Hotel and accommodation search
- Product catalog vectorization
- Semantic similarity matching
- Query understanding and intent detection

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{vectorpath2025searchmap,
  title={SearchMap: Conversational E-commerce Search Embedding Model},
  author={VectorPath Research Team},
  year={2025},
  publisher={Hugging Face},
  journal={HuggingFace Model Hub},
}
```

## Contact and Community

- Discord Community: [Join our Discord](https://discord.gg/gXvVfqGD)
- GitHub Issues: Report bugs and feature requests
- Interactive Demo: [Try it on Colab](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)

## License

This model is released under the MIT License. See the LICENSE file for more details.