Spaces:
Sleeping
Sleeping
File size: 6,387 Bytes
89f3d56 7eccd3a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
---
title: Loan Prediction System
emoji: π¦
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: scripts/app.py
pinned: false
license: mit
tags:
- machine-learning
- deep-learning
- loan-prediction
- pytorch
- streamlit
- finance
- classification
datasets:
- lending-club
models:
- pytorch
---
# π¦ Loan Prediction System
A comprehensive machine learning system for predicting loan approval decisions using deep neural networks. This project implements an end-to-end ML pipeline with exploratory data analysis, feature engineering, model training, and deployment capabilities.
## π Project Overview
This project uses the LendingClub dataset to build a robust loan prediction model that helps financial institutions make data-driven lending decisions. The system achieves **70.1% accuracy** with **86.4% precision** using a deep neural network architecture.
### Key Features
- **Advanced EDA**: Comprehensive exploratory data analysis with feature engineering
- **Deep Learning Model**: Multi-layer neural network with dropout regularization
- **Production Ready**: Streamlit web application for real-time predictions
- **Robust Pipeline**: End-to-end ML pipeline with data preprocessing and model training
- **Performance Monitoring**: Detailed metrics and visualization tools
## π― Performance Metrics
| Metric | Score |
|--------|-------|
| Accuracy | 70.1% |
| Precision | 86.4% |
| Recall | 74.5% |
| F1-Score | 80.0% |
| AUC-ROC | 69.0% |
## ποΈ Architecture
### Model Architecture
- **Input Layer**: 9 features (after feature selection)
- **Hidden Layers**:
- Layer 1: 128 neurons (ReLU, Dropout 0.3)
- Layer 2: 64 neurons (ReLU, Dropout 0.3)
- Layer 3: 32 neurons (ReLU, Dropout 0.2)
- Layer 4: 16 neurons (ReLU, Dropout 0.1)
- **Output Layer**: 1 neuron (Sigmoid activation)
### Project Structure
```
loan_prediction/
βββ README.md # Main project documentation
βββ requirements.txt # Python dependencies
βββ src/ # Source code
β βββ model.py # Neural network architecture
β βββ train.py # Training pipeline
β βββ inference.py # Inference and prediction
βββ scripts/ # Utility scripts
β βββ app.py # Streamlit web application
βββ notebooks/ # Jupyter notebooks
β βββ EDA.ipynb # Exploratory data analysis
βββ docs/ # Documentation
β βββ EDA_README.md # EDA decisions and methodology
β βββ MODEL_ARCHITECTURE.md # Model design details
βββ data/ # Data files
β βββ lending_club_loan_two.csv
β βββ lending_club_info.csv
β βββ processed/ # Processed data files
βββ bin/ # Model checkpoints
β βββ best_checkpoint.pth
βββ __pycache__/ # Python cache files
```
## π Quick Start
### Prerequisites
- Python 3.8+
- PyTorch 1.12+
- Streamlit 1.28+
### Installation
1. **Clone the repository**
```bash
git clone <repository-url>
cd loan_prediction
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Run the web application**
```bash
streamlit run scripts/app.py
```
### Training the Model
```bash
python src/train.py
```
### Making Predictions
```bash
# Interactive single prediction
python src/inference.py --single
# Batch prediction
python src/inference.py --batch input.csv output.csv
# Sample prediction
python src/inference.py --sample
```
## π Usage Examples
### Web Application
Launch the Streamlit app for an interactive loan prediction interface:
```bash
streamlit run scripts/app.py
```
### Command Line Inference
```bash
# Single prediction with interactive input
python src/inference.py --single
# Batch processing
python src/inference.py --batch data/test_file.csv results/predictions.csv
```
### Training Custom Model
```bash
python src/train.py --epochs 200 --batch_size 1536 --learning_rate 0.012
```
## π Data & Features
### Dataset
- **Source**: LendingClub loan data
- **Size**: ~400,000 loan records
- **Features**: 23 original features reduced to 9 after feature selection
### Selected Features
1. **loan_amnt**: Loan amount requested
2. **int_rate**: Interest rate on the loan
3. **installment**: Monthly payment amount
4. **grade**: LC assigned loan grade
5. **emp_length**: Employment length in years
6. **annual_inc**: Annual income
7. **dti**: Debt-to-income ratio
8. **open_acc**: Number of open credit accounts
9. **pub_rec**: Number of derogatory public records
## π Documentation
- **[EDA Analysis & Decisions](docs/EDA_README.md)** - Detailed explanation of exploratory data analysis and feature engineering decisions
- **[Model Architecture](docs/MODEL_ARCHITECTURE.md)** - Deep dive into neural network design and training methodology
## π§ Configuration
### Training Configuration
```json
{
"learning_rate": 0.012,
"batch_size": 1536,
"num_epochs": 200,
"early_stopping_patience": 30,
"weight_decay": 0.0001,
"validation_split": 0.2
}
```
## π Model Performance
### Training History
- **Best Epoch**: Achieved at epoch 112
- **Training Loss**: Converged to ~0.32
- **Validation Loss**: Stabilized at ~0.34
- **Early Stopping**: Activated after 30 epochs without improvement
### Class Distribution
- **Default Rate**: ~22% (imbalanced dataset)
- **Handling**: Weighted loss function and class balancing techniques
## π€ Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Acknowledgments
- LendingClub for providing the dataset
- PyTorch team for the deep learning framework
- Streamlit for the web application framework
## π Contact
For questions or support, please open an issue in the repository.
---
**Note**: This model is for educational and research purposes. Always consult with financial experts before making actual lending decisions.
|