SearchMap_Preview / README.md

Update README.md

c81f597 verified 3 months ago

5.2 kB

	---
	language: en
	tags:
	- embedding
	- transformers
	- search
	- e-commerce
	- conversational-search
	- semantic-search
	license: mit
	pipeline_tag: feature-extraction
	---

	# VectorPath SearchMap: Conversational E-commerce Search Embedding Model

	## Model Description

	SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products.

	## Key Features

	- Optimized for conversational e-commerce queries
	- Handles complex, natural language search intents
	- Supports multi-attribute product search
	- Efficient 1024-dimensional embeddings (configurable up to 8192)
	- Specialized for product and hotel search scenarios

	## Quick Start

	Try out the model in our interactive [Colab Demo](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)!

	## Model Details

	- Base Model: Stella Embed 400M v5
	- Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192)
	- Training Data: 100,000+ e-commerce products across 32 categories
	- License: MIT
	- Framework: PyTorch / Sentence Transformers

	## Usage

	### Using Sentence Transformers

	```python
	# Install required packages
	!pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3

	from sentence_transformers import SentenceTransformer

	# Initialize the model
	model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True)

	# Encode queries
	query = "A treat my dog and I can eat together"
	query_embedding = model.encode(query)

	# Encode products
	product_description = "Organic peanut butter dog treats, safe for human consumption..."
	product_embedding = model.encode(product_description)
	```

	### Using with FAISS for Vector Search

	```python
	import numpy as np
	import faiss

	# Create FAISS index
	embedding_dimension = 1024 # or your chosen dimension
	index = faiss.IndexFlatL2(embedding_dimension)

	# Add product embeddings
	product_embeddings = model.encode(product_descriptions, show_progress_bar=True)
	index.add(np.array(product_embeddings).astype('float32'))

	# Search
	query_embedding = model.encode([query])
	distances, indices = index.search(
	np.array(query_embedding).astype('float32'),
	k=10
	)
	```

	### Example Search Queries

	The model excels at understanding natural language queries like:
	- "A treat my dog and I can eat together"
	- "Lightweight waterproof hiking backpack for summer trails"
	- "Eco-friendly kitchen gadgets for a small apartment"
	- "Comfortable shoes for standing all day at work"
	- "Cereal for my 4 year old son that likes to miss breakfast"

	## Performance and Limitations

	### Evaluation
	The model's evaluation metrics are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
	- The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size
	- The model also is No 1. by a far margin on the [SemRel24STS](https://huggingface.co/datasets/SemRel/SemRel2024) task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages.
	- We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard


	### Strengths
	- Excellent at understanding conversational and natural language queries
	- Strong performance in e-commerce and hotel search scenarios
	- Handles complex multi-attribute queries
	- Efficient computation with configurable embedding dimensions

	### Current Limitations
	- May not fully prioritize weighted terms in queries
	- Limited handling of slang and colloquial language
	- Regional language variations might need fine-tuning

	## Training Details

	The model was trained using:
	- Supervised learning with Sentence Transformers
	- 100,000+ product dataset across 32 categories
	- AI-generated conversational search queries
	- Positive and negative product examples for contrast learning

	## Intended Use

	This model is designed for:
	- E-commerce product search and recommendations
	- Hotel and accommodation search
	- Product catalog vectorization
	- Semantic similarity matching
	- Query understanding and intent detection

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{vectorpath2025searchmap,
	title={SearchMap: Conversational E-commerce Search Embedding Model},
	author={VectorPath Research Team},
	year={2025},
	publisher={Hugging Face},
	journal={HuggingFace Model Hub},
	}
	```

	## Contact and Community

	- Discord Community: [Join our Discord](https://discord.gg/gXvVfqGD)
	- GitHub Issues: Report bugs and feature requests
	- Interactive Demo: [Try it on Colab](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)

	## License

	This model is released under the MIT License. See the LICENSE file for more details.