File size: 1,762 Bytes
87b08b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language: en
license: mit
tags:
  - keras
  - lstm
  - spam-classification
  - text-classification
  - binary-classification
  - email
  - deep-learning
library_name: keras
pipeline_tag: text-classification
model_name: Spam Email Classifier (BiLSTM)
datasets:
  - SetFit/enron_spam
---

# πŸ“§ Spam Email Classifier using BiLSTM

This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.

---

## 🧠 Model Architecture

- **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset  
- **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
- **Model**: `Embedding β†’ BiLSTM β†’ Dropout β†’ Dense(sigmoid)`
- **Input**: English email/message text  
- **Output**: `0 = Ham`, `1 = Spam`

---

## πŸ§ͺ Example Usage

```python
from tensorflow.keras.models import load_model
from huggingface_hub import hf_hub_download
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load files from HF Hub
model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")

# Load model and tokenizer
model = load_model(model_path)
with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)

# Prediction function
def predict_spam(text):
    seq = tokenizer.texts_to_sequences([text])
    padded = pad_sequences(seq, maxlen=50)  # must match training maxlen
    pred = model.predict(padded)[0][0]
    return "🚫 Spam" if pred > 0.5 else "βœ… Not Spam"

# Example
print(predict_spam("Win a free iPhone now!"))