AventIQ-AI
/

bert-spam-detection

Safetensors

bert

Model card Files Files and versions Community

developerPushkal commited on Feb 19

Commit

0a374ec

verified ·

1 Parent(s): c2e13f3

Create README.md

Browse files

Files changed (1) hide show

README.md +109 -0

README.md ADDED Viewed

	@@ -0,0 +1,109 @@

+# BERT Base Uncased Quantized Model for Spam Detection
+This repository hosts a quantized version of the BERT model, fine-tuned for spam detection tasks. The model has been optimized for efficient deployment while maintaining high accuracy, making it suitable for resource-constrained environments.
+## Model Details
+- **Model Architecture:** BERT Base Uncased
+- **Task:** Spam Email Detection
+- **Dataset:** Hugging Face's `mail_spam_ham_dataset` and 'spam-mail'
+- **Quantization:** Float16
+- **Fine-tuning Framework:** Hugging Face Transformers
+## Usage
+### Installation
+```sh
+pip install transformers torch
+```
+### Loading the Model
+```python
+from transformers import BertTokenizer, BertForSequenceClassification
+import torch
+model_name = "AventIQ-AI/bert-spam-detection"
+tokenizer = BertTokenizer.from_pretrained(model_name)
+model = BertForSequenceClassification.from_pretrained(model_name)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+def predict_spam_quantized(text):
+    """Predicts whether a given text is spam (1) or ham (0) using the quantized BERT model."""
+    # Tokenize input text
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+    # Move inputs to GPU (if available)
+    inputs = {key: value.to(device) for key, value in inputs.items()}
+    # Perform inference
+    with torch.no_grad():
+        outputs = model(**inputs)
+    # Get predicted label (0 = ham, 1 = spam)
+    prediction = torch.argmax(outputs.logits, dim=1).item()
+    return "Spam" if prediction == 1 else "Ham"
+# Sample test messages
+print(predict_spam_quantized("WINNER!! As a valued network customer you have been selected to receivea Â£900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only."))
+# Expected output: Spam
+print(predict_spam_quantized("WINNER!! As a valued network customer you have been selected to receivea Â£900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only."))
+# Expected output: Ham
+```
+## 📊 Classification Report (Quantized Model - float16)
+| Metric      | Class 0 (Non-Spam) | Class 1 (Spam) | Macro Avg | Weighted Avg |
+|------------|----------------|----------------|------------|--------------|
+| **Precision** | 1.00           | 0.98           | 0.99       | 0.99         |
+| **Recall**    | 0.99           | 0.99           | 0.99       | 0.99         |
+| **F1-Score**  | 0.99           | 0.99           | 0.99       | 0.99         |
+| **Accuracy**  | **99%**        | **99%**        | **99%**    | **99%**      |
+### 🔍 **Observations**
+✅ **Precision:** High (1.00 for non-spam, 0.98 for spam) → **Few false positives**
+✅ **Recall:** High (0.99 for both classes) → **Few false negatives**
+✅ **F1-Score:** **Near-perfect balance** between precision & recall
+## Fine-Tuning Details
+### Dataset
+The Hugging Face's 'spam-mail' and 'mail_spam_ham_dataset' datasets are combined together and used, containing both spam and ham (non-spam) examples.
+### Training
+- Number of epochs: 3
+- Batch size: 8
+- Evaluation strategy: epoch
+- Learning rate: 2e-5
+### Quantization
+Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.
+## Repository Structure
+```
+.
+├── model/               # Contains the quantized model files
+├── tokenizer_config/    # Tokenizer configuration and vocabulary files
+├── model.safetensors/   # Fine Tuned Model
+├── README.md            # Model documentation
+```
+## Limitations
+- The model may not generalize well to domains outside the fine-tuning dataset.
+- Quantization may result in minor accuracy degradation compared to full-precision models.
+## Contributing
+Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.