File size: 3,650 Bytes
002233a
e556daf
 
 
 
 
 
 
002233a
e556daf
002233a
60cec19
002233a
60cec19
e556daf
60cec19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e556daf
60cec19
 
 
 
 
 
 
 
 
e556daf
60cec19
 
 
 
 
 
e556daf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
title: TrueCheck - Fake News Detection
emoji: πŸ“°
colorFrom: red
colorTo: blue
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: mit
---
# TruthCheck: Fake News Detection with Fine-Tuned BERT

TruthCheck is an advanced fake news detection system leveraging a hybrid deep learning architecture. It combines a pre-trained BERT-base-uncased model with a BiLSTM and attention mechanism, fully fine-tuned on a curated dataset of real and fake news. The project includes robust preprocessing, feature extraction, model training, evaluation, and a Streamlit web app for interactive predictions.

---

## πŸš€ Features
- **Hybrid Model:** BERT-base-uncased + BiLSTM + Attention
- **Full Fine-Tuning:** All layers of BERT and additional layers are trainable and optimized on the fake news dataset
- **Comprehensive Preprocessing:** Cleaning, tokenization, lemmatization, and more
- **Training & Evaluation:** Scripts for training, validation, and test evaluation
- **Interactive App:** Streamlit web app for real-time news classification
- **Ready for Deployment:** Easily extendable for research or production

---

## 🧠 Model Details
- **Base Model:** [BERT-base-uncased](https://huggingface.co/bert-base-uncased)
- **Architecture:**
  - BERT encoder (pre-trained, all layers fine-tuned)
  - BiLSTM layer for sequential context
  - Attention mechanism for interpretability
  - Fully connected classification head
- **Fine-Tuning Technique:**
  - All BERT layers are unfrozen and updated during training (full fine-tuning)
  - Additional layers (BiLSTM, attention, classifier) are trained from scratch

---

## πŸ“₯ Download Data and Model

**Raw and Processed Datasets:**  
[Google Drive Link](https://drive.google.com/drive/folders/1tAhWhhhDes5uCdcnMLmJdFBSGWFFl55M?usp=sharing)

**Trained Model(s):**  
[Google Drive Link](https://drive.google.com/drive/folders/1VEFa0y_vW6AzT5x0fRwmX8shoBhUGd7K?usp=sharing)

### **Instructions:**
1. Download the datasets and place them in the `data/` directory:
    - `data/raw/` for raw files
    - `data/processed/` for processed files
2. Download the trained model (e.g., `final_model.pt` or `best_model.pt`) and place it in `models/saved/`.

---

## βš™οΈ Setup

1. **Clone the repository:**
    ```bash
    git clone https://github.com/adnaan-tariq/fake-news-detection.git
    cd fake-news-detection
    ```
2. **Create and activate a virtual environment:**
    ```bash
    python -m venv venv
    .\venv\Scripts\activate
    ```
3. **Install dependencies:**
    ```bash
    pip install --upgrade pip
    pip install -r requirements.txt
    ```

---

## πŸƒβ€β™‚οΈ Usage

### **Train the Model**
If you want to train from scratch (after placing the data as described above):
```bash
python -m src.train
```

### **Run the Streamlit App**
```bash
streamlit run app.py
```
- Open [http://localhost:8501](http://localhost:8501) in your browser.

### **Test the Model**
- The app and scripts will use the model in `models/saved/final_model.pt` by default.
- For custom inference, see the example in `src/app.py` or ask for a sample script.

---

## πŸ“Š Results
- **Validation Accuracy:** ~93%
- **Validation F1 Score:** ~0.93
- (See training logs and visualizations for more details.)

---

## πŸ“¦ Data & Model Policy
- **Data and model files are NOT included in this repository.**
- Please download them from the provided Google Drive links above.


## 🀝 Contributing
Pull requests and suggestions are welcome! For major changes, please open an issue first to discuss what you would like to change.

---

## πŸ“„ License
This project is licensed under the MIT License.

---