arkodeep commited on
Commit
94ce757
·
verified ·
1 Parent(s): 3333ce1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: wtfpl
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: wtfpl
3
+ datasets:
4
+ - arkodeep/spam-data
5
+ language:
6
+ - en
7
+ tags:
8
+ - spam
9
+ - spam classification
10
+ - text
11
+ - spam detection
12
+ - text classification
13
+ ---
14
+
15
+ # Spam Detection System
16
+
17
+ ## Lite Model
18
+
19
+ ### Introduction
20
+ The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.
21
+
22
+ ### Features
23
+ - **Text Preprocessing**: Lemmatization, removal of stop words and punctuation.
24
+ - **Feature Extraction**: Text length, word count, unique word count, uppercase count, special character count.
25
+ - **Model Creation**: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
26
+ - **Visualization**: Generates graphs for dataset insights, word clouds, and performance metrics.
27
+ - **Metrics Saving**: Accuracy, precision, and F1 score.
28
+
29
+ ### How to Run
30
+ 1. **Train the Model**:
31
+ ```bash
32
+ python training/train_model_lite.py
33
+ ```
34
+ 2. **Use the Model**:
35
+ ```python
36
+ import joblib
37
+ model = joblib.load('models/model.pkl')
38
+ vectorizer = joblib.load('models/vectorizer.pkl')
39
+ ```
40
+
41
+ ## Legacy Model
42
+
43
+ ### Introduction
44
+ The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.
45
+
46
+ ### Features
47
+ - **Text Preprocessing**: Porter Stemming, removal of stop words and punctuation.
48
+ - **Model Creation**: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
49
+ - **Visualization**: Generates graphs for dataset insights, word clouds, and performance metrics.
50
+ - **Metrics Saving**: Accuracy and precision.
51
+
52
+ ### How to Run
53
+ 1. **Train the Model**:
54
+ ```bash
55
+ python training/train_model_legacy.py
56
+ ```
57
+ 2. **Use the Model**:
58
+ ```python
59
+ import joblib
60
+ model = joblib.load('models/model.pkl')
61
+ vectorizer = joblib.load('models/vectorizer.pkl')
62
+ ```
63
+
64
+ ### Additional Information
65
+ - **Dependencies**: Python 3.6 or higher, pip, and required packages listed in `requirements.txt`.
66
+ - **Dataset**: The dataset used for training is `spam.csv`.
67
+ - **Contact and Support**: For questions or support, please contact the project maintainers.
68
+
69
+ For more details, you can refer to the [README.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/README.md) and [models.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/models.md).