**🧠 SMSDetection-DistilBERT-SMS** A DistilBERT-based binary classifier fine-tuned on the SMS Spam Collection dataset. It classifies messages as either **spam** or **ham** (not spam). This model is suitable for real-world applications like mobile SMS spam filters, automated customer message triage, and telecom fraud detection. --- ✨ **Model Highlights** - 📌 Based on `distilbert-base-uncased` - 🔍 Fine-tuned on the SMS Spam Collection dataset - ⚡ Supports binary classification: Spam vs Not Spam - 💾 Lightweight and optimized for both CPU and GPU environments --- 🧠 Intended Uses - ✅ Mobile SMS spam filtering - ✅ Telecom customer service automation - ✅ Fraudulent message detection - ✅ User inbox categorization - ✅ Regulatory compliance monitoring --- - 🚫 Limitations - ❌ Trained on English SMS messages only - ❌ May underperform on emails, social media texts, or non-English content - ❌ Not designed for multilingual datasets - ❌ Slight performance dip expected for long messages (>128 tokens) --- 🏋️‍♂️ Training Details | Field | Value | | -------------- | ------------------------------ | | **Base Model** | `distilbert-base-uncased` | | **Dataset** |SMS Spam Collection (UCI) | | **Framework** | PyTorch with 🤗 Transformers | | **Epochs** | 3 | | **Batch Size** | 16 | | **Max Length** | 128 tokens | | **Optimizer** | AdamW | | **Loss** | CrossEntropyLoss (token-level) | | **Device** | Trained on CUDA-enabled GPU | --- 📊 Evaluation Metrics | Metric | Score | | ----------------------------------------------- | ----- | | Accuracy | 0.99 | | F1-Score | 0.96 | | Precision | 0.98 | | Recall | 0.93 | --- --- 🚀 Usage ```python from transformers import BertTokenizerFast, BertForTokenClassification from transformers import pipeline import torch model_name = "AventIQ-AI/SMS-Spam-Detection-Model" tokenizer = BertTokenizerFast.from_pretrained(model_name) model = BertForTokenClassification.from_pretrained(model_name) model.eval() # Inference device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) def predict_sms(text): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted = torch.argmax(logits, dim=1).item() return "spam" if predicted == 1 else "ham" # Test example print(predict_sms("You've won $1,000,000! Call now to claim your prize!")) ``` --- - 🧩 Quantization - Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices. ---- 🗂 Repository Structure ``` . ├── model/ # Quantized model files ├── tokenizer_config/ # Tokenizer and vocab files ├── model.safensors/ # Fine-tuned model in safetensors format ├── README.md # Model card ``` --- 🤝 Contributing Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.