Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.46.1
title: Loan Prediction System
emoji: π¦
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: scripts/app.py
pinned: false
license: mit
tags:
- machine-learning
- deep-learning
- loan-prediction
- pytorch
- streamlit
- finance
- classification
datasets:
- lending-club
models:
- pytorch
π¦ Loan Prediction System
A comprehensive machine learning system for predicting loan approval decisions using deep neural networks. This project implements an end-to-end ML pipeline with exploratory data analysis, feature engineering, model training, and deployment capabilities.
π Project Overview
This project uses the LendingClub dataset to build a robust loan prediction model that helps financial institutions make data-driven lending decisions. The system achieves 70.1% accuracy with 86.4% precision using a deep neural network architecture.
Key Features
- Advanced EDA: Comprehensive exploratory data analysis with feature engineering
- Deep Learning Model: Multi-layer neural network with dropout regularization
- Production Ready: Streamlit web application for real-time predictions
- Robust Pipeline: End-to-end ML pipeline with data preprocessing and model training
- Performance Monitoring: Detailed metrics and visualization tools
π― Performance Metrics
Metric | Score |
---|---|
Accuracy | 70.1% |
Precision | 86.4% |
Recall | 74.5% |
F1-Score | 80.0% |
AUC-ROC | 69.0% |
ποΈ Architecture
Model Architecture
- Input Layer: 9 features (after feature selection)
- Hidden Layers:
- Layer 1: 128 neurons (ReLU, Dropout 0.3)
- Layer 2: 64 neurons (ReLU, Dropout 0.3)
- Layer 3: 32 neurons (ReLU, Dropout 0.2)
- Layer 4: 16 neurons (ReLU, Dropout 0.1)
- Output Layer: 1 neuron (Sigmoid activation)
Project Structure
loan_prediction/
βββ README.md # Main project documentation
βββ requirements.txt # Python dependencies
βββ src/ # Source code
β βββ model.py # Neural network architecture
β βββ train.py # Training pipeline
β βββ inference.py # Inference and prediction
βββ scripts/ # Utility scripts
β βββ app.py # Streamlit web application
βββ notebooks/ # Jupyter notebooks
β βββ EDA.ipynb # Exploratory data analysis
βββ docs/ # Documentation
β βββ EDA_README.md # EDA decisions and methodology
β βββ MODEL_ARCHITECTURE.md # Model design details
βββ data/ # Data files
β βββ lending_club_loan_two.csv
β βββ lending_club_info.csv
β βββ processed/ # Processed data files
βββ bin/ # Model checkpoints
β βββ best_checkpoint.pth
βββ __pycache__/ # Python cache files
π Quick Start
Prerequisites
- Python 3.8+
- PyTorch 1.12+
- Streamlit 1.28+
Installation
Clone the repository
git clone <repository-url> cd loan_prediction
Install dependencies
pip install -r requirements.txt
Run the web application
streamlit run scripts/app.py
Training the Model
python src/train.py
Making Predictions
# Interactive single prediction
python src/inference.py --single
# Batch prediction
python src/inference.py --batch input.csv output.csv
# Sample prediction
python src/inference.py --sample
π Usage Examples
Web Application
Launch the Streamlit app for an interactive loan prediction interface:
streamlit run scripts/app.py
Command Line Inference
# Single prediction with interactive input
python src/inference.py --single
# Batch processing
python src/inference.py --batch data/test_file.csv results/predictions.csv
Training Custom Model
python src/train.py --epochs 200 --batch_size 1536 --learning_rate 0.012
π Data & Features
Dataset
- Source: LendingClub loan data
- Size: ~400,000 loan records
- Features: 23 original features reduced to 9 after feature selection
Selected Features
- loan_amnt: Loan amount requested
- int_rate: Interest rate on the loan
- installment: Monthly payment amount
- grade: LC assigned loan grade
- emp_length: Employment length in years
- annual_inc: Annual income
- dti: Debt-to-income ratio
- open_acc: Number of open credit accounts
- pub_rec: Number of derogatory public records
π Documentation
- EDA Analysis & Decisions - Detailed explanation of exploratory data analysis and feature engineering decisions
- Model Architecture - Deep dive into neural network design and training methodology
π§ Configuration
Training Configuration
{
"learning_rate": 0.012,
"batch_size": 1536,
"num_epochs": 200,
"early_stopping_patience": 30,
"weight_decay": 0.0001,
"validation_split": 0.2
}
π Model Performance
Training History
- Best Epoch: Achieved at epoch 112
- Training Loss: Converged to ~0.32
- Validation Loss: Stabilized at ~0.34
- Early Stopping: Activated after 30 epochs without improvement
Class Distribution
- Default Rate: ~22% (imbalanced dataset)
- Handling: Weighted loss function and class balancing techniques
π€ Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- LendingClub for providing the dataset
- PyTorch team for the deep learning framework
- Streamlit for the web application framework
π Contact
For questions or support, please open an issue in the repository.
Note: This model is for educational and research purposes. Always consult with financial experts before making actual lending decisions.