File size: 6,387 Bytes
89f3d56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7eccd3a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: Loan Prediction System
emoji: 🏦
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: scripts/app.py
pinned: false
license: mit
tags:
  - machine-learning
  - deep-learning
  - loan-prediction
  - pytorch
  - streamlit
  - finance
  - classification
datasets:
  - lending-club
models:
  - pytorch
---

# 🏦 Loan Prediction System

A comprehensive machine learning system for predicting loan approval decisions using deep neural networks. This project implements an end-to-end ML pipeline with exploratory data analysis, feature engineering, model training, and deployment capabilities.

## πŸ“Š Project Overview

This project uses the LendingClub dataset to build a robust loan prediction model that helps financial institutions make data-driven lending decisions. The system achieves **70.1% accuracy** with **86.4% precision** using a deep neural network architecture.

### Key Features

- **Advanced EDA**: Comprehensive exploratory data analysis with feature engineering
- **Deep Learning Model**: Multi-layer neural network with dropout regularization
- **Production Ready**: Streamlit web application for real-time predictions
- **Robust Pipeline**: End-to-end ML pipeline with data preprocessing and model training
- **Performance Monitoring**: Detailed metrics and visualization tools

## 🎯 Performance Metrics

| Metric | Score |
|--------|-------|
| Accuracy | 70.1% |
| Precision | 86.4% |
| Recall | 74.5% |
| F1-Score | 80.0% |
| AUC-ROC | 69.0% |

## πŸ—οΈ Architecture

### Model Architecture
- **Input Layer**: 9 features (after feature selection)
- **Hidden Layers**: 
  - Layer 1: 128 neurons (ReLU, Dropout 0.3)
  - Layer 2: 64 neurons (ReLU, Dropout 0.3)
  - Layer 3: 32 neurons (ReLU, Dropout 0.2)
  - Layer 4: 16 neurons (ReLU, Dropout 0.1)
- **Output Layer**: 1 neuron (Sigmoid activation)

### Project Structure

```
loan_prediction/
β”œβ”€β”€ README.md                 # Main project documentation
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ src/                      # Source code
β”‚   β”œβ”€β”€ model.py             # Neural network architecture
β”‚   β”œβ”€β”€ train.py             # Training pipeline
β”‚   └── inference.py         # Inference and prediction
β”œβ”€β”€ scripts/                  # Utility scripts
β”‚   └── app.py               # Streamlit web application
β”œβ”€β”€ notebooks/               # Jupyter notebooks
β”‚   └── EDA.ipynb           # Exploratory data analysis
β”œβ”€β”€ docs/                    # Documentation
β”‚   β”œβ”€β”€ EDA_README.md       # EDA decisions and methodology
β”‚   └── MODEL_ARCHITECTURE.md # Model design details
β”œβ”€β”€ data/                    # Data files
β”‚   β”œβ”€β”€ lending_club_loan_two.csv
β”‚   β”œβ”€β”€ lending_club_info.csv
β”‚   └── processed/          # Processed data files
β”œβ”€β”€ bin/                     # Model checkpoints
β”‚   └── best_checkpoint.pth
└── __pycache__/            # Python cache files
```

## πŸš€ Quick Start

### Prerequisites

- Python 3.8+
- PyTorch 1.12+
- Streamlit 1.28+

### Installation

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd loan_prediction
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Run the web application**
   ```bash
   streamlit run scripts/app.py
   ```

### Training the Model

```bash
python src/train.py
```

### Making Predictions

```bash
# Interactive single prediction
python src/inference.py --single

# Batch prediction
python src/inference.py --batch input.csv output.csv

# Sample prediction
python src/inference.py --sample
```

## πŸ“‹ Usage Examples

### Web Application
Launch the Streamlit app for an interactive loan prediction interface:
```bash
streamlit run scripts/app.py
```

### Command Line Inference
```bash
# Single prediction with interactive input
python src/inference.py --single

# Batch processing
python src/inference.py --batch data/test_file.csv results/predictions.csv
```

### Training Custom Model
```bash
python src/train.py --epochs 200 --batch_size 1536 --learning_rate 0.012
```

## πŸ“ˆ Data & Features

### Dataset
- **Source**: LendingClub loan data
- **Size**: ~400,000 loan records
- **Features**: 23 original features reduced to 9 after feature selection

### Selected Features
1. **loan_amnt**: Loan amount requested
2. **int_rate**: Interest rate on the loan
3. **installment**: Monthly payment amount
4. **grade**: LC assigned loan grade
5. **emp_length**: Employment length in years
6. **annual_inc**: Annual income
7. **dti**: Debt-to-income ratio
8. **open_acc**: Number of open credit accounts
9. **pub_rec**: Number of derogatory public records

## πŸ“š Documentation

- **[EDA Analysis & Decisions](docs/EDA_README.md)** - Detailed explanation of exploratory data analysis and feature engineering decisions
- **[Model Architecture](docs/MODEL_ARCHITECTURE.md)** - Deep dive into neural network design and training methodology

## πŸ”§ Configuration

### Training Configuration
```json
{
  "learning_rate": 0.012,
  "batch_size": 1536,
  "num_epochs": 200,
  "early_stopping_patience": 30,
  "weight_decay": 0.0001,
  "validation_split": 0.2
}
```

## πŸ“Š Model Performance

### Training History
- **Best Epoch**: Achieved at epoch 112
- **Training Loss**: Converged to ~0.32
- **Validation Loss**: Stabilized at ~0.34
- **Early Stopping**: Activated after 30 epochs without improvement

### Class Distribution
- **Default Rate**: ~22% (imbalanced dataset)
- **Handling**: Weighted loss function and class balancing techniques

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## πŸ“ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸ™ Acknowledgments

- LendingClub for providing the dataset
- PyTorch team for the deep learning framework
- Streamlit for the web application framework

## πŸ“ž Contact

For questions or support, please open an issue in the repository.

---

**Note**: This model is for educational and research purposes. Always consult with financial experts before making actual lending decisions.