Spaces:

adnaan05
/

TruthCheck

Sleeping

File size: 3,650 Bytes

---
title: TrueCheck - Fake News Detection
emoji: 📰
colorFrom: red
colorTo: blue
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: mit
---
# TruthCheck: Fake News Detection with Fine-Tuned BERT

TruthCheck is an advanced fake news detection system leveraging a hybrid deep learning architecture. It combines a pre-trained BERT-base-uncased model with a BiLSTM and attention mechanism, fully fine-tuned on a curated dataset of real and fake news. The project includes robust preprocessing, feature extraction, model training, evaluation, and a Streamlit web app for interactive predictions.

---

## 🚀 Features
- **Hybrid Model:** BERT-base-uncased + BiLSTM + Attention
- **Full Fine-Tuning:** All layers of BERT and additional layers are trainable and optimized on the fake news dataset
- **Comprehensive Preprocessing:** Cleaning, tokenization, lemmatization, and more
- **Training & Evaluation:** Scripts for training, validation, and test evaluation
- **Interactive App:** Streamlit web app for real-time news classification
- **Ready for Deployment:** Easily extendable for research or production

---

## 🧠 Model Details
- **Base Model:** [BERT-base-uncased](https://huggingface.co/bert-base-uncased)
- **Architecture:**
  - BERT encoder (pre-trained, all layers fine-tuned)
  - BiLSTM layer for sequential context
  - Attention mechanism for interpretability
  - Fully connected classification head
- **Fine-Tuning Technique:**
  - All BERT layers are unfrozen and updated during training (full fine-tuning)
  - Additional layers (BiLSTM, attention, classifier) are trained from scratch

---

## 📥 Download Data and Model

**Raw and Processed Datasets:**  
[Google Drive Link](https://drive.google.com/drive/folders/1tAhWhhhDes5uCdcnMLmJdFBSGWFFl55M?usp=sharing)

**Trained Model(s):**  
[Google Drive Link](https://drive.google.com/drive/folders/1VEFa0y_vW6AzT5x0fRwmX8shoBhUGd7K?usp=sharing)

### **Instructions:**
1. Download the datasets and place them in the `data/` directory:
    - `data/raw/` for raw files
    - `data/processed/` for processed files
2. Download the trained model (e.g., `final_model.pt` or `best_model.pt`) and place it in `models/saved/`.

---

## ⚙️ Setup

1. **Clone the repository:**
    ```bash
    git clone https://github.com/adnaan-tariq/fake-news-detection.git
    cd fake-news-detection
    ```
2. **Create and activate a virtual environment:**
    ```bash
    python -m venv venv
    .\venv\Scripts\activate
    ```
3. **Install dependencies:**
    ```bash
    pip install --upgrade pip
    pip install -r requirements.txt
    ```

---

## 🏃‍♂️ Usage

### **Train the Model**
If you want to train from scratch (after placing the data as described above):
```bash
python -m src.train
```

### **Run the Streamlit App**
```bash
streamlit run app.py
```
- Open [http://localhost:8501](http://localhost:8501) in your browser.

### **Test the Model**
- The app and scripts will use the model in `models/saved/final_model.pt` by default.
- For custom inference, see the example in `src/app.py` or ask for a sample script.

---

## 📊 Results
- **Validation Accuracy:** ~93%
- **Validation F1 Score:** ~0.93
- (See training logs and visualizations for more details.)

---

## 📦 Data & Model Policy
- **Data and model files are NOT included in this repository.**
- Please download them from the provided Google Drive links above.


## 🤝 Contributing
Pull requests and suggestions are welcome! For major changes, please open an issue first to discuss what you would like to change.

---

## 📄 License
This project is licensed under the MIT License.

---