File size: 1,762 Bytes
87b08b3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
language: en
license: mit
tags:
- keras
- lstm
- spam-classification
- text-classification
- binary-classification
- email
- deep-learning
library_name: keras
pipeline_tag: text-classification
model_name: Spam Email Classifier (BiLSTM)
datasets:
- SetFit/enron_spam
---
# π§ Spam Email Classifier using BiLSTM
This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.
---
## π§ Model Architecture
- **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset
- **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
- **Model**: `Embedding β BiLSTM β Dropout β Dense(sigmoid)`
- **Input**: English email/message text
- **Output**: `0 = Ham`, `1 = Spam`
---
## π§ͺ Example Usage
```python
from tensorflow.keras.models import load_model
from huggingface_hub import hf_hub_download
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Load files from HF Hub
model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")
# Load model and tokenizer
model = load_model(model_path)
with open(tokenizer_path, "rb") as f:
tokenizer = pickle.load(f)
# Prediction function
def predict_spam(text):
seq = tokenizer.texts_to_sequences([text])
padded = pad_sequences(seq, maxlen=50) # must match training maxlen
pred = model.predict(padded)[0][0]
return "π« Spam" if pred > 0.5 else "β
Not Spam"
# Example
print(predict_spam("Win a free iPhone now!"))
|