Spaces:
Sleeping
Sleeping
File size: 3,650 Bytes
002233a e556daf 002233a e556daf 002233a 60cec19 002233a 60cec19 e556daf 60cec19 e556daf 60cec19 e556daf 60cec19 e556daf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
title: TrueCheck - Fake News Detection
emoji: π°
colorFrom: red
colorTo: blue
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: mit
---
# TruthCheck: Fake News Detection with Fine-Tuned BERT
TruthCheck is an advanced fake news detection system leveraging a hybrid deep learning architecture. It combines a pre-trained BERT-base-uncased model with a BiLSTM and attention mechanism, fully fine-tuned on a curated dataset of real and fake news. The project includes robust preprocessing, feature extraction, model training, evaluation, and a Streamlit web app for interactive predictions.
---
## π Features
- **Hybrid Model:** BERT-base-uncased + BiLSTM + Attention
- **Full Fine-Tuning:** All layers of BERT and additional layers are trainable and optimized on the fake news dataset
- **Comprehensive Preprocessing:** Cleaning, tokenization, lemmatization, and more
- **Training & Evaluation:** Scripts for training, validation, and test evaluation
- **Interactive App:** Streamlit web app for real-time news classification
- **Ready for Deployment:** Easily extendable for research or production
---
## π§ Model Details
- **Base Model:** [BERT-base-uncased](https://huggingface.co/bert-base-uncased)
- **Architecture:**
- BERT encoder (pre-trained, all layers fine-tuned)
- BiLSTM layer for sequential context
- Attention mechanism for interpretability
- Fully connected classification head
- **Fine-Tuning Technique:**
- All BERT layers are unfrozen and updated during training (full fine-tuning)
- Additional layers (BiLSTM, attention, classifier) are trained from scratch
---
## π₯ Download Data and Model
**Raw and Processed Datasets:**
[Google Drive Link](https://drive.google.com/drive/folders/1tAhWhhhDes5uCdcnMLmJdFBSGWFFl55M?usp=sharing)
**Trained Model(s):**
[Google Drive Link](https://drive.google.com/drive/folders/1VEFa0y_vW6AzT5x0fRwmX8shoBhUGd7K?usp=sharing)
### **Instructions:**
1. Download the datasets and place them in the `data/` directory:
- `data/raw/` for raw files
- `data/processed/` for processed files
2. Download the trained model (e.g., `final_model.pt` or `best_model.pt`) and place it in `models/saved/`.
---
## βοΈ Setup
1. **Clone the repository:**
```bash
git clone https://github.com/adnaan-tariq/fake-news-detection.git
cd fake-news-detection
```
2. **Create and activate a virtual environment:**
```bash
python -m venv venv
.\venv\Scripts\activate
```
3. **Install dependencies:**
```bash
pip install --upgrade pip
pip install -r requirements.txt
```
---
## πββοΈ Usage
### **Train the Model**
If you want to train from scratch (after placing the data as described above):
```bash
python -m src.train
```
### **Run the Streamlit App**
```bash
streamlit run app.py
```
- Open [http://localhost:8501](http://localhost:8501) in your browser.
### **Test the Model**
- The app and scripts will use the model in `models/saved/final_model.pt` by default.
- For custom inference, see the example in `src/app.py` or ask for a sample script.
---
## π Results
- **Validation Accuracy:** ~93%
- **Validation F1 Score:** ~0.93
- (See training logs and visualizations for more details.)
---
## π¦ Data & Model Policy
- **Data and model files are NOT included in this repository.**
- Please download them from the provided Google Drive links above.
## π€ Contributing
Pull requests and suggestions are welcome! For major changes, please open an issue first to discuss what you would like to change.
---
## π License
This project is licensed under the MIT License.
---
|