YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DistilBERT-Based Quantized Model for Spam Detection
This repository hosts a quantized version of the DistilBERT model, fine-tuned for spam detection tasks. The model is optimized for efficient deployment, making it suitable for resource-constrained environments while maintaining high accuracy.
Model Details
- Model Architecture: DistilBERT Base Uncased
- Task: Binary Spam Detection
- Dataset: Custom Spam Dataset (CSV format)
- Quantization: Float16
- Fine-tuning Framework: Hugging Face Transformers
Usage
Installation
pip install transformers torch datasets scikit-learn
Loading the Model
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch
# Load quantized model
model_path = "quantized-model"
quantized_model = DistilBertForSequenceClassification.from_pretrained(model_path)
tokenizer = DistilBertTokenizerFast.from_pretrained(model_path)
quantized_model.eval()
quantized_model.half()
# Example inference
text = "Congratulations! You've won a $1000 Walmart gift card. Go to http://bit.ly/123456 to claim now."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
outputs = quantized_model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()
label_map = {0: "Not Spam", 1: "Spam"}
print(f"Predicted Label: {label_map[predicted_class]}")
Performance Metrics
- Accuracy: ~0.97
- F1 Score: Optimized via early stopping and best model selection strategy
Fine-Tuning Details
Dataset
The dataset consists of labeled SMS/email messages as spam or not spam.
Training Configuration
- Epochs: 2
- Batch size: 16 (train), 64 (eval)
- Learning rate: 3e-5
- Evaluation strategy: per epoch
- Early stopping: enabled
- Mixed precision (fp16): enabled on GPU
Quantization
Post-training quantization was applied using PyTorch to reduce model size and improve inference speed.
Repository Structure
.
βββ spam_model/ # Original trained model
βββ quantized-model/ # Quantized model for deployment
βββ tokenizer/ # Tokenizer files
βββ README.md # Documentation
Limitations
- May not generalize well to messages with formats unseen during training.
- Quantization might slightly impact accuracy.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support