ragkasi commited on
Commit
c82a30d
Β·
verified Β·
1 Parent(s): 0403b6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +177 -165
README.md CHANGED
@@ -1,165 +1,177 @@
1
- # Fake News Detection Project
2
-
3
- A machine learning project that classifies news articles as real or fake using both traditional NLP techniques and advanced transformer models.
4
-
5
- ## 🎯 Project Overview
6
-
7
- This project implements multiple approaches to detect fake news:
8
- - **Traditional ML**: TF-IDF vectorization with Logistic Regression
9
- - **Deep Learning**: Fine-tuned BERT model for sequence classification
10
-
11
- ## πŸ“Š Performance Results
12
-
13
- ### TF-IDF + Logistic Regression Model
14
- - **Accuracy**: 98.62%
15
- - **F1 Score**: 98.67%
16
-
17
- #### Detailed Classification Report:
18
- ```
19
- precision recall f1-score support
20
-
21
- 0 0.98 0.99 0.99 4284 (Real News)
22
- 1 0.99 0.98 0.99 4696 (Fake News)
23
-
24
- accuracy 0.99 8980
25
- macro avg 0.99 0.99 0.99 8980
26
- weighted avg 0.99 0.99 0.99 8980
27
- ```
28
-
29
- ## πŸ“ Project Structure
30
-
31
- ```
32
- FakeNewsDetector/
33
- β”œβ”€β”€ README.md
34
- β”œβ”€β”€ requirements.txt
35
- β”œβ”€β”€ notebooks/
36
- β”‚ └── FakeNewsClassifier_HuggingFace.ipynb
37
- β”œβ”€β”€ scripts/
38
- β”‚ └── train.py
39
- β”œβ”€β”€ models/
40
- β”‚ └── bert-fake-news/ (generated after training)
41
- β”œβ”€β”€ data/
42
- β”œβ”€β”€ app/
43
- └── venv/
44
- ```
45
-
46
- ## πŸš€ Quick Start
47
-
48
- ### 1. Clone and Setup
49
- ```bash
50
- git clone <repository-url>
51
- cd FakeNewsDetector
52
- ```
53
-
54
- ### 2. Create Virtual Environment
55
- ```bash
56
- python -m venv venv
57
-
58
- # Windows PowerShell
59
- .\venv\Scripts\Activate.ps1
60
-
61
- # Windows CMD
62
- .\venv\Scripts\activate.bat
63
-
64
- # Git Bash
65
- source venv/Scripts/activate
66
- ```
67
-
68
- ### 3. Install Dependencies
69
- ```bash
70
- pip install -r requirements.txt
71
- ```
72
-
73
- ### 4. Launch Jupyter Notebook
74
- ```bash
75
- jupyter notebook
76
- ```
77
-
78
- ## πŸ“š Dataset
79
-
80
- The project uses the `mrm8488/fake-news` dataset from Hugging Face, which contains:
81
- - **Total articles**: ~45,000
82
- - **Training split**: 80% (~36,000 articles)
83
- - **Test split**: 20% (~9,000 articles)
84
- - **Classes**:
85
- - 0: Real News
86
- - 1: Fake News
87
-
88
- ## πŸ”§ Models Implemented
89
-
90
- ### 1. TF-IDF + Logistic Regression
91
- - **Vectorizer**: TF-IDF with 5,000 max features, n-grams (1,2)
92
- - **Classifier**: Logistic Regression with balanced class weights
93
- - **Performance**: 98.62% accuracy
94
-
95
- ### 2. BERT Fine-tuning
96
- - **Base Model**: `bert-base-uncased`
97
- - **Training**: 3 epochs with evaluation per epoch
98
- - **Optimizer**: AdamW with learning rate 2e-5
99
- - **Batch Size**: 8 per device
100
-
101
- ## πŸ› οΈ Usage
102
-
103
- ### Running the Notebook
104
- 1. Ensure your virtual environment is activated
105
- 2. Start Jupyter: `jupyter notebook`
106
- 3. Open `notebooks/FakeNewsClassifier_HuggingFace.ipynb`
107
- 4. Make sure the kernel is set to "venv" or "FakeNewsDetector (venv)"
108
- 5. Run all cells
109
-
110
- ### Training BERT Model
111
- ```bash
112
- python scripts/train.py
113
- ```
114
-
115
- The trained model will be saved to `models/bert-fake-news/`
116
-
117
- ## πŸ“‹ Requirements
118
-
119
- - Python 3.8+
120
- - pandas
121
- - scikit-learn
122
- - datasets (Hugging Face)
123
- - transformers
124
- - torch
125
- - matplotlib
126
- - seaborn
127
- - jupyter
128
- - ipywidgets
129
-
130
- ## 🎯 Key Features
131
-
132
- - **High Accuracy**: Achieves 98.6% accuracy on test set
133
- - **Multiple Approaches**: Compares traditional ML vs. transformer models
134
- - **Easy Setup**: Simple virtual environment setup
135
- - **Comprehensive Analysis**: Includes confusion matrix and detailed metrics
136
- - **Production Ready**: Trained models can be saved and deployed
137
-
138
- ## πŸ” Model Analysis
139
-
140
- The TF-IDF + Logistic Regression model shows excellent performance:
141
- - **Balanced Performance**: High precision and recall for both classes
142
- - **Low False Positives**: 98% precision for fake news detection
143
- - **Low False Negatives**: 99% recall for real news detection
144
- - **Robust**: Handles class imbalance well with balanced weights
145
-
146
- ## πŸš€ Future Improvements
147
-
148
- - [ ] Implement ensemble methods combining multiple models
149
- - [ ] Add cross-validation for more robust evaluation
150
- - [ ] Experiment with other transformer models (RoBERTa, DistilBERT)
151
- - [ ] Deploy model as a web API
152
- - [ ] Add real-time news article classification
153
- - [ ] Implement explainability features (LIME, SHAP)
154
-
155
- ## 🀝 Contributing
156
-
157
- Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](../../issues).
158
-
159
- ## πŸ“§ Contact
160
-
161
- For questions or suggestions, please open an issue or contact the project maintainer.
162
-
163
- ---
164
-
165
- **Note**: This project is for educational and research purposes. Always verify news from multiple reliable sources.
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Fake News Detector
3
+ colorFrom: gray
4
+ colorTo: red
5
+ sdk: streamlit
6
+ sdk_version: "1.33.0"
7
+ app_file: app.py
8
+ pinned: false
9
+ ---
10
+
11
+ Detect fake news using a fine-tuned BERT model. Enter any headline or statement and get an instant prediction on whether it's likely real or fake. Built with Streamlit and Hugging Face Transformers.
12
+
13
+ # Fake News Detection Project
14
+
15
+ A machine learning project that classifies news articles as real or fake using both traditional NLP techniques and advanced transformer models.
16
+
17
+ ## 🎯 Project Overview
18
+
19
+ This project implements multiple approaches to detect fake news:
20
+ - **Traditional ML**: TF-IDF vectorization with Logistic Regression
21
+ - **Deep Learning**: Fine-tuned BERT model for sequence classification
22
+
23
+ ## πŸ“Š Performance Results
24
+
25
+ ### TF-IDF + Logistic Regression Model
26
+ - **Accuracy**: 98.62%
27
+ - **F1 Score**: 98.67%
28
+
29
+ #### Detailed Classification Report:
30
+ ```
31
+ precision recall f1-score support
32
+
33
+ 0 0.98 0.99 0.99 4284 (Real News)
34
+ 1 0.99 0.98 0.99 4696 (Fake News)
35
+
36
+ accuracy 0.99 8980
37
+ macro avg 0.99 0.99 0.99 8980
38
+ weighted avg 0.99 0.99 0.99 8980
39
+ ```
40
+
41
+ ## πŸ“ Project Structure
42
+
43
+ ```
44
+ FakeNewsDetector/
45
+ β”œβ”€β”€ README.md
46
+ β”œβ”€β”€ requirements.txt
47
+ β”œβ”€β”€ notebooks/
48
+ β”‚ └── FakeNewsClassifier_HuggingFace.ipynb
49
+ β”œβ”€β”€ scripts/
50
+ β”‚ └── train.py
51
+ β”œβ”€β”€ models/
52
+ β”‚ └── bert-fake-news/ (generated after training)
53
+ β”œβ”€β”€ data/
54
+ β”œβ”€β”€ app/
55
+ └── venv/
56
+ ```
57
+
58
+ ## πŸš€ Quick Start
59
+
60
+ ### 1. Clone and Setup
61
+ ```bash
62
+ git clone <repository-url>
63
+ cd FakeNewsDetector
64
+ ```
65
+
66
+ ### 2. Create Virtual Environment
67
+ ```bash
68
+ python -m venv venv
69
+
70
+ # Windows PowerShell
71
+ .\venv\Scripts\Activate.ps1
72
+
73
+ # Windows CMD
74
+ .\venv\Scripts\activate.bat
75
+
76
+ # Git Bash
77
+ source venv/Scripts/activate
78
+ ```
79
+
80
+ ### 3. Install Dependencies
81
+ ```bash
82
+ pip install -r requirements.txt
83
+ ```
84
+
85
+ ### 4. Launch Jupyter Notebook
86
+ ```bash
87
+ jupyter notebook
88
+ ```
89
+
90
+ ## πŸ“š Dataset
91
+
92
+ The project uses the `mrm8488/fake-news` dataset from Hugging Face, which contains:
93
+ - **Total articles**: ~45,000
94
+ - **Training split**: 80% (~36,000 articles)
95
+ - **Test split**: 20% (~9,000 articles)
96
+ - **Classes**:
97
+ - 0: Real News
98
+ - 1: Fake News
99
+
100
+ ## πŸ”§ Models Implemented
101
+
102
+ ### 1. TF-IDF + Logistic Regression
103
+ - **Vectorizer**: TF-IDF with 5,000 max features, n-grams (1,2)
104
+ - **Classifier**: Logistic Regression with balanced class weights
105
+ - **Performance**: 98.62% accuracy
106
+
107
+ ### 2. BERT Fine-tuning
108
+ - **Base Model**: `bert-base-uncased`
109
+ - **Training**: 3 epochs with evaluation per epoch
110
+ - **Optimizer**: AdamW with learning rate 2e-5
111
+ - **Batch Size**: 8 per device
112
+
113
+ ## πŸ› οΈ Usage
114
+
115
+ ### Running the Notebook
116
+ 1. Ensure your virtual environment is activated
117
+ 2. Start Jupyter: `jupyter notebook`
118
+ 3. Open `notebooks/FakeNewsClassifier_HuggingFace.ipynb`
119
+ 4. Make sure the kernel is set to "venv" or "FakeNewsDetector (venv)"
120
+ 5. Run all cells
121
+
122
+ ### Training BERT Model
123
+ ```bash
124
+ python scripts/train.py
125
+ ```
126
+
127
+ The trained model will be saved to `models/bert-fake-news/`
128
+
129
+ ## πŸ“‹ Requirements
130
+
131
+ - Python 3.8+
132
+ - pandas
133
+ - scikit-learn
134
+ - datasets (Hugging Face)
135
+ - transformers
136
+ - torch
137
+ - matplotlib
138
+ - seaborn
139
+ - jupyter
140
+ - ipywidgets
141
+
142
+ ## 🎯 Key Features
143
+
144
+ - **High Accuracy**: Achieves 98.6% accuracy on test set
145
+ - **Multiple Approaches**: Compares traditional ML vs. transformer models
146
+ - **Easy Setup**: Simple virtual environment setup
147
+ - **Comprehensive Analysis**: Includes confusion matrix and detailed metrics
148
+ - **Production Ready**: Trained models can be saved and deployed
149
+
150
+ ## πŸ” Model Analysis
151
+
152
+ The TF-IDF + Logistic Regression model shows excellent performance:
153
+ - **Balanced Performance**: High precision and recall for both classes
154
+ - **Low False Positives**: 98% precision for fake news detection
155
+ - **Low False Negatives**: 99% recall for real news detection
156
+ - **Robust**: Handles class imbalance well with balanced weights
157
+
158
+ ## πŸš€ Future Improvements
159
+
160
+ - [ ] Implement ensemble methods combining multiple models
161
+ - [ ] Add cross-validation for more robust evaluation
162
+ - [ ] Experiment with other transformer models (RoBERTa, DistilBERT)
163
+ - [ ] Deploy model as a web API
164
+ - [ ] Add real-time news article classification
165
+ - [ ] Implement explainability features (LIME, SHAP)
166
+
167
+ ## 🀝 Contributing
168
+
169
+ Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](../../issues).
170
+
171
+ ## πŸ“§ Contact
172
+
173
+ For questions or suggestions, please open an issue or contact the project maintainer.
174
+
175
+ ---
176
+
177
+ **Note**: This project is for educational and research purposes. Always verify news from multiple reliable sources.