loan_prediction / README.md
nullHawk's picture
add: readme
89f3d56 verified

A newer version of the Streamlit SDK is available: 1.46.1

Upgrade
metadata
title: Loan Prediction System
emoji: 🏦
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: scripts/app.py
pinned: false
license: mit
tags:
  - machine-learning
  - deep-learning
  - loan-prediction
  - pytorch
  - streamlit
  - finance
  - classification
datasets:
  - lending-club
models:
  - pytorch

🏦 Loan Prediction System

A comprehensive machine learning system for predicting loan approval decisions using deep neural networks. This project implements an end-to-end ML pipeline with exploratory data analysis, feature engineering, model training, and deployment capabilities.

πŸ“Š Project Overview

This project uses the LendingClub dataset to build a robust loan prediction model that helps financial institutions make data-driven lending decisions. The system achieves 70.1% accuracy with 86.4% precision using a deep neural network architecture.

Key Features

  • Advanced EDA: Comprehensive exploratory data analysis with feature engineering
  • Deep Learning Model: Multi-layer neural network with dropout regularization
  • Production Ready: Streamlit web application for real-time predictions
  • Robust Pipeline: End-to-end ML pipeline with data preprocessing and model training
  • Performance Monitoring: Detailed metrics and visualization tools

🎯 Performance Metrics

Metric Score
Accuracy 70.1%
Precision 86.4%
Recall 74.5%
F1-Score 80.0%
AUC-ROC 69.0%

πŸ—οΈ Architecture

Model Architecture

  • Input Layer: 9 features (after feature selection)
  • Hidden Layers:
    • Layer 1: 128 neurons (ReLU, Dropout 0.3)
    • Layer 2: 64 neurons (ReLU, Dropout 0.3)
    • Layer 3: 32 neurons (ReLU, Dropout 0.2)
    • Layer 4: 16 neurons (ReLU, Dropout 0.1)
  • Output Layer: 1 neuron (Sigmoid activation)

Project Structure

loan_prediction/
β”œβ”€β”€ README.md                 # Main project documentation
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ src/                      # Source code
β”‚   β”œβ”€β”€ model.py             # Neural network architecture
β”‚   β”œβ”€β”€ train.py             # Training pipeline
β”‚   └── inference.py         # Inference and prediction
β”œβ”€β”€ scripts/                  # Utility scripts
β”‚   └── app.py               # Streamlit web application
β”œβ”€β”€ notebooks/               # Jupyter notebooks
β”‚   └── EDA.ipynb           # Exploratory data analysis
β”œβ”€β”€ docs/                    # Documentation
β”‚   β”œβ”€β”€ EDA_README.md       # EDA decisions and methodology
β”‚   └── MODEL_ARCHITECTURE.md # Model design details
β”œβ”€β”€ data/                    # Data files
β”‚   β”œβ”€β”€ lending_club_loan_two.csv
β”‚   β”œβ”€β”€ lending_club_info.csv
β”‚   └── processed/          # Processed data files
β”œβ”€β”€ bin/                     # Model checkpoints
β”‚   └── best_checkpoint.pth
└── __pycache__/            # Python cache files

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • PyTorch 1.12+
  • Streamlit 1.28+

Installation

  1. Clone the repository

    git clone <repository-url>
    cd loan_prediction
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Run the web application

    streamlit run scripts/app.py
    

Training the Model

python src/train.py

Making Predictions

# Interactive single prediction
python src/inference.py --single

# Batch prediction
python src/inference.py --batch input.csv output.csv

# Sample prediction
python src/inference.py --sample

πŸ“‹ Usage Examples

Web Application

Launch the Streamlit app for an interactive loan prediction interface:

streamlit run scripts/app.py

Command Line Inference

# Single prediction with interactive input
python src/inference.py --single

# Batch processing
python src/inference.py --batch data/test_file.csv results/predictions.csv

Training Custom Model

python src/train.py --epochs 200 --batch_size 1536 --learning_rate 0.012

πŸ“ˆ Data & Features

Dataset

  • Source: LendingClub loan data
  • Size: ~400,000 loan records
  • Features: 23 original features reduced to 9 after feature selection

Selected Features

  1. loan_amnt: Loan amount requested
  2. int_rate: Interest rate on the loan
  3. installment: Monthly payment amount
  4. grade: LC assigned loan grade
  5. emp_length: Employment length in years
  6. annual_inc: Annual income
  7. dti: Debt-to-income ratio
  8. open_acc: Number of open credit accounts
  9. pub_rec: Number of derogatory public records

πŸ“š Documentation

πŸ”§ Configuration

Training Configuration

{
  "learning_rate": 0.012,
  "batch_size": 1536,
  "num_epochs": 200,
  "early_stopping_patience": 30,
  "weight_decay": 0.0001,
  "validation_split": 0.2
}

πŸ“Š Model Performance

Training History

  • Best Epoch: Achieved at epoch 112
  • Training Loss: Converged to ~0.32
  • Validation Loss: Stabilized at ~0.34
  • Early Stopping: Activated after 30 epochs without improvement

Class Distribution

  • Default Rate: ~22% (imbalanced dataset)
  • Handling: Weighted loss function and class balancing techniques

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • LendingClub for providing the dataset
  • PyTorch team for the deep learning framework
  • Streamlit for the web application framework

πŸ“ž Contact

For questions or support, please open an issue in the repository.


Note: This model is for educational and research purposes. Always consult with financial experts before making actual lending decisions.