intent_classifier / README.md

Update README.md

903b224 verified 3 months ago

5.45 kB

	---
	license: mit
	language:
	- en
	tags:
	- intent-classification
	- mental-health
	- transformer
	- conversational-ai
	pipeline_tag: text-classification
	base_model: distilbert-base-uncased
	---

	# 🧠 Intent Classifier (MindPadi)

	The `intent_classifier` is a transformer-based text classification model trained to detect user intents in a mental health support setting. It powers the MindPadi assistant's ability to route conversations to the appropriate modules—like emotional support, scheduling, reflection, or journal analysis—based on the user’s message.



	## 📝 Model Overview

	- Model Architecture: DistilBERT (uncased) + classification head
	- Task: Intent Classification
	- Classes: Over 20 intent categories (e.g., `vent`, `gratitude`, `help_request`, `journal_analysis`)
	- Model Size: ~66M parameters
	- Files:
	- `config.json`
	- `pytorch_model.bin` or `model.safetensors`
	- `tokenizer_config.json`, `vocab.txt`, `tokenizer.json`
	- `checkpoint-*/` (optional training checkpoints)



	## ✅ Intended Use

	### ✔️ Use Cases
	- Detecting user intent in MindPadi mental health conversations
	- Enabling context-specific dialogue flows
	- Assisting with journal entry triage and tagging
	- Triggering therapy-related tools (e.g., emotion check-ins, PubMed summaries)

	### 🚫 Not Intended For
	- Multilingual intent classification (English-only)
	- Legal or medical diagnosis tasks
	- Multi-label classification (currently single-label per input)



	## 💡 Example Intents Detected

	\| Intent \| Description \|
	\|--------------------\|-------------------------------------------------------\|
	\| `vent` \| User expressing frustration or emotion freely \|
	\| `help_request` \| Seeking mental health support \|
	\| `schedule_session` \| Booking a therapy check-in \|
	\| `gratitude` \| Showing appreciation for support \|
	\| `journal_analysis` \| Submitting a journal entry for AI feedback \|
	\| `reflection` \| Talking about personal growth or setbacks \|
	\| `not_sure` \| Unsure or unclear message from user \|



	## 🛠️ Training Details

	- Base Model: `distilbert-base-uncased`
	- Dataset: Curated and annotated conversations (`training/datasets/finetuned/intents/`)
	- Script: `training/train_intent_classifier.py`
	- Preprocessing:
	- Text normalization (lowercasing, punctuation removal)
	- Label encoding
	- Loss: CrossEntropyLoss
	- Metrics: Accuracy, F1-score
	- Tokenizer: WordPiece (DistilBERT tokenizer)



	## 📊 Evaluation

	\| Metric \| Score \|
	\|-----------\|-------------\|
	\| Accuracy \| 91.3% \|
	\| F1-score \| 89.8% \|
	\| Recall@3 \| 97.1% \|
	\| Precision \| 88.4% \|

	Evaluation performed on a held-out validation split of MindPadi intent dataset.



	## 🔍 Example Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("mindpadi/intent_classifier")
	tokenizer = AutoTokenizer.from_pretrained("mindpadi/intent_classifier")

	text = "I’m struggling with my emotions today"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)

	predicted_class = torch.argmax(outputs.logits, dim=1).item()
	print("Predicted intent ID:", predicted_class)
	````

	To map `intent ID → label`, load your label encoder from:

	```python
	from joblib import load
	label_encoder = load("intent_encoder/label_encoder.joblib")
	print("Predicted intent:", label_encoder.inverse_transform([predicted_class])[0])
	```


	## 🔌 Inference Endpoint Example

	```python
	import requests

	API_URL = "https://api-inference.huggingface.co/models/mindpadi/intent_classifier"
	headers = {"Authorization": f"Bearer <your-api-token>"}
	payload = {"inputs": "Can I book a mental health session?"}

	response = requests.post(API_URL, headers=headers, json=payload)
	print(response.json())
	```



	## ⚠️ Limitations

	* Not robust to long-form texts (>256 tokens); truncate or summarize input.
	* May confuse overlapping intents like `vent` and `help_request`
	* False positives possible in vague or sarcastic inputs
	* Requires pairing with fallback model (`intent_fallback`) for reliability



	## 🔐 Ethical Considerations

	* This model is for supportive routing, not clinical diagnosis
	* Use with user consent and proper data privacy safeguards
	* Intent predictions should not override human judgment in sensitive contexts



	## 📂 Integration Points

	\| Location \| Functionality \|
	\| ---------------------------------- \| --------------------------------------------- \|
	\| `app/chatbot/intent_classifier.py` \| Main classifier logic \|
	\| `app/chatbot/intent_router.py` \| Routes based on predicted intent \|
	\| `app/utils/embedding_search.py` \| Uses `intent_encoder` for similarity fallback \|
	\| `data/processed_intents.json` \| Annotated intent samples \|



	## 📜 License

	MIT License – freely available for commercial and non-commercial use.


	## 📬 Contact

	* Team: MindPadi AI Developers
	* Profile: [https://huggingface.co/mindpadi](https://huggingface.co/mindpadi)
	* Email: \[[[email protected]](mailto:[email protected])]

	Last updated: May 2025