📩 DistilBERT Quantized Model for SMS Spam Detection

This repository contains a production-ready quantized DistilBERT model fine-tuned for SMS spam classification, achieving 99.94% accuracy while optimizing inference speed using ONNX Runtime.

📌 Model Details

Model Architecture: DistilBERT Base Uncased
Task: Binary SMS Spam Classification (ham=0, spam=1)
Dataset: Custom SMS Spam Collection (5,574 messages)
Quantization: ONNX Runtime Dynamic Quantization
Fine-tuning Framework: Hugging Face Transformers + Optimum

🚀 Quick Start

🧰 Installation

pip install -r requirements.txt

✅ Basic Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="./spam_model_quantized",
    tokenizer="./spam_model"
)

sample = "WINNER!! Claim your $1000 prize now!"
result = classifier(sample)
print(f"Prediction: {result[0]['label']} (confidence: {result[0]['score']:.2%})")

📈 Performance Metrics

Metric	Value
Accuracy	99.94%
F1 Score	0.9977
Precision	100%
Recall	99.55%
Inference*	2.7ms

* Tested on AWS t3.xlarge (4 vCPUs)

🛠 Advanced Usage

🔍 Load Quantized Model Directly

from optimum.onnxruntime import ORTModelForSequenceClassification

model = ORTModelForSequenceClassification.from_pretrained(
    "./spam_model_quantized",
    provider="CPUExecutionProvider"
)

📊 Batch Processing

import pandas as pd

df = pd.read_csv("messages.csv")
predictions = classifier(list(df["text"]), batch_size=32)

🎯 Training Details

🔧 Hyperparameters

Parameter	Value
Epochs	5 (early stopped at 3)
Batch Size	12 (train), 16 (eval)
Learning Rate	3e-5
Warmup Steps	10% of data
Weight Decay	0.01

⚡ Quantization Benefits

Metric	Original	Quantized
Model Size	255MB	68MB
CPU Latency	9.2ms	2.7ms
Throughput	110/sec	380/sec

📁 Repository Structure

.
├── spam_model/               # Original PyTorch model
│   ├── config.json
│   ├── model.safetensors
│   └── tokenizer.json
├── spam_model_quantized/     # Production-ready quantized model
│   ├── model.onnx
│   ├── quantized_model.onnx
│   └── tokenizer_config.json
├── examples/                 # Ready-to-use scripts
│   ├── predict.py            # CLI interface
│   └── api_server.py         # FastAPI service
├── requirements.txt          # Dependencies
└── README.md                 # This document

🚀 Deployment Options

1. Local REST API

uvicorn examples.api_server:app --port 8000

2. Docker Container

FROM python:3.9-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "examples.api_server:app", "--host", "0.0.0.0"]

⚠️ Limitations

Optimized for English SMS messages
May require retraining for regional language or localized spam patterns
Quantized model requires x86 CPUs with AVX2 support

🙌 Contributions

Pull requests and suggestions are welcome! Please open an issue for feature requests or bug reports.