AventIQ-AI
/

distilbert-spam-detection

PyTorch

distilbert

Model card Files Files and versions Community

ayushsinha commited on Feb 10

Commit

d75fac7

verified ·

1 Parent(s): 909491c

Create README.md

Browse files

Files changed (1) hide show

README.md +107 -0

README.md ADDED Viewed

	@@ -0,0 +1,107 @@

+# DistilBERT Base Uncased Quantized Model for Spam Detection
+This repository hosts a quantized version of the DistilBERT model, fine-tuned for spam detection tasks. The model has been optimized for efficient deployment while maintaining high accuracy, making it suitable for resource-constrained environments.
+## Model Details
+- **Model Architecture:** DistilBERT Base Uncased
+- **Task:** Spam Detection
+- **Dataset:** Hugging Face's `sms_spam`
+- **Quantization:** BrainFloat16
+- **Fine-tuning Framework:** Hugging Face Transformers
+## Usage
+### Installation
+```sh
+pip install transformers torch
+```
+### Loading the Model
+```python
+from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments
+import torch
+model_name = "AventIQ-AI/distilbert-spam-detection"
+tokenizer = DistilBertTokenizer.from_pretrained(model_name)
+model = DistilBertForSequenceClassification.from_pretrained(model_name)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+def predict_spam(text, model, tokenizer, device):
+    model.eval()  # Set to evaluation mode
+    inputs = tokenizer(text, return_tensors="pt", padding="max_length", truncation=True, max_length=128).to(device)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        probs = torch.softmax(outputs.logits, dim=-1)
+        pred_class = torch.argmax(probs).item()
+    return "Spam" if pred_class == 1 else "Not Spam"
+# Sample test messages
+test_messages = [
+    "Congratulations! You have won a lottery of $1,000,000. Claim now!",  # Spam
+    "Hey, are we still meeting for dinner tonight?",  # Not Spam
+    "URGENT: Your bank account is at risk! Click this link to secure it now.",  # Spam
+    "Let's catch up this weekend. It’s been a while!",  # Not Spam
+    "Exclusive offer! Get 50% off on your next purchase. Limited time only!",  # Spam
+]
+# Run inference on test messages
+for i, msg in enumerate(test_messages):
+    prediction = predict_spam(msg, model, tokenizer, device)
+    print(f"Sample {i+1}: {msg} -> Prediction: {prediction}")
+```
+## 📊 Classification Report (Quantized Model - bfloat16)
+| Metric      | Class 0 (Non-Spam) | Class 1 (Spam) | Macro Avg | Weighted Avg |
+|------------|----------------|----------------|------------|--------------|
+| **Precision** | 1.00           | 0.98           | 0.99       | 0.99         |
+| **Recall**    | 0.99           | 0.99           | 0.99       | 0.99         |
+| **F1-Score**  | 0.99           | 0.99           | 0.99       | 0.99         |
+| **Accuracy**  | **99%**        | **99%**        | **99%**    | **99%**      |
+### 🔍 **Observations**
+✅ **Precision:** High (1.00 for non-spam, 0.98 for spam) → **Few false positives**
+✅ **Recall:** High (0.99 for both classes) → **Few false negatives**
+✅ **F1-Score:** **Near-perfect balance** between precision & recall
+## Fine-Tuning Details
+### Dataset
+The Hugging Face's `sms_spam` dataset was used, containing both spam and ham (non-spam) examples.
+### Training
+- Number of epochs: 7
+- Batch size: 16
+- Evaluation strategy: epoch
+- Learning rate: 5e-6
+### Quantization
+Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.
+## Repository Structure
+```
+.
+├── model/               # Contains the quantized model files
+├── tokenizer_config/    # Tokenizer configuration and vocabulary files
+├── pytorch_model.bin/   # Fine Tuned Model
+├── README.md            # Model documentation
+```
+## Limitations
+- The model may not generalize well to domains outside the fine-tuning dataset.
+- Quantization may result in minor accuracy degradation compared to full-precision models.
+## Contributing
+Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.