Intent Encoder (MindPadi)
The intent_encoder
is a Sentence Transformer model used in the MindPadi mental health assistant for encoding user messages into dense embeddings. These embeddings support intent classification, similarity search, and memory recall workflows. It plays a foundational role in the semantic understanding of user inputs across various MindPadi features.
π§ Model Overview
- Architecture: Sentence-BERT (
all-MiniLM-L6-v2
base) - Task: Sentence Embedding / Semantic Similarity
- Purpose: Embed user queries for intent classification, vector search, and memory retrieval
- Size: ~80M parameters
- Files:
config.json
pytorch_model.bin
ormodel.safetensors
tokenizer.json
,vocab.txt
1_Pooling/
,2_Normalize/
(Sentence-BERT components)
π§Ύ Intended Use
βοΈ Primary Use Cases
- Semantic embedding of user inputs for intent recognition
- Matching new messages against known intent samples (
data/processed_intents.json
) - Supporting vector similarity in MongoDB Atlas Search or ChromaDB
- Powering memory in LangGraph agentic workflows
π« Not Recommended For
- Direct intent classification (this model returns embeddings, not classes)
- Use outside of NLP (e.g., image, audio)
π§ͺ Integration in MindPadi
app/chatbot/intent_classifier.py
: Uses this model to compute sentence embeddingsapp/chatbot/intent_router.py
: Leverages vector similarity for intent matchingdatabase/vector_search.py
: Embeddings are stored or queried from MongoDB vector indexapp/utils/embedding_search.py
: Embeds utterances for real-time nearest-neighbor lookup
ποΈ Training Details
- Base Model:
sentence-transformers/all-MiniLM-L6-v2
(pretrained) - Fine-tuning: Optional domain-specific contrastive learning using pairs in
training/datasets/fallback_pairs.json
- Script:
training/fine_tune_encoder.py
(if fine-tuned) - Tokenizer: BERT-based WordPiece tokenizer
- Max Token Length: 128
π Evaluation
While this model is not evaluated via classification metrics, its embedding quality was assessed through:
- Cosine similarity tests (intent embedding similarity)
- Intent clustering accuracy with
KMeans
in vector space - Recall@K for correct intent retrieval
- Visualizations: UMAP plots (
logs/intent_umap.png
)
Results indicate:
- High-quality clustering of semantically similar intents
- ~91% Top-3 Recall for known intents
π¬ Example Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("mindpadi/intent_encoder")
texts = ["I want to talk to a therapist", "Book a session", "I'm feeling anxious"]
embeddings = model.encode(texts)
print(embeddings.shape) # (3, 384)
π§ͺ Deployment (API Example)
import requests
endpoint = "https://api-inference.huggingface.co/models/mindpadi/intent_encoder"
headers = {"Authorization": f"Bearer <your-token>"}
payload = {"inputs": "I need help managing stress"}
response = requests.post(endpoint, json=payload, headers=headers)
embedding = response.json()
β οΈ Limitations
- English-only
- Short, clean sentences work best (not optimized for long documents)
- Does not directly return intent labels β must be paired with clustering or classification logic
- May yield ambiguous vectors for multi-intent or vague inputs
π License
MIT License β open for personal, academic, and commercial use with attribution.
π¬ Contact
- Project: MindPadi Mental Health Assistant
- Team: MindPadi Developers
- Email: [[email protected]]
- GitHub: [https://github.com/mindpadi]
Last updated: May 2025
- Downloads last month
- 61
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for mindpadi/intent_encoder
Base model
sentence-transformers/all-MiniLM-L6-v2