|
--- |
|
license: mit |
|
tags: |
|
- scikit-learn |
|
- tabular |
|
- nonprofit |
|
- planned-giving |
|
- classification |
|
- snowflake |
|
--- |
|
|
|
# Planned Giving Propensity Model |
|
|
|
A machine learning solution to optimize planned giving donor targeting for the National Parks Conservation Association (NPCA). |
|
|
|
> **Note**: This model is not currently deployed or downloadable due to data privacy constraints. This repository shares the modeling approach, evaluation strategy, and relevant pipeline components for reproducibility and educational use. |
|
|
|
## Project Overview |
|
|
|
This project implements a Random Forest classifier to identify potential planned giving donors, with the goal of improving mailing efficiency and response rates. The model processes donor data through Snowflakeβs computing infrastructure and uses SMOTE to handle class imbalance. |
|
|
|
## Key Results |
|
|
|
- **PR-AUC**: 0.88 β strong performance on imbalanced data |
|
- **F1 Score**: 0.8125 |
|
- **Precision**: 0.7558 |
|
- **Recall**: 0.8784 β high capture rate of known planned givers |
|
- **1,019 new high-potential donor predictions** for targeted outreach |
|
|
|
## Technical Implementation |
|
|
|
### Data Pipeline |
|
- Donor data extracted from CRM into Snowflake |
|
- Modular Python scripts for feature engineering and cleaning |
|
- SMOTE oversampling to address class imbalance |
|
|
|
### Machine Learning |
|
- Random Forest classifier with `scikit-learn` |
|
- Stratified cross-validation and grid search |
|
- Multiple imputation strategies (MICE, mean, median) |
|
- Key temporal features (e.g., time since last gift) |
|
|
|
π [Training Script](./model/split_SMOTE_crossval.py) |
|
π [Evaluation Notebook](./model/snowflake_model_evaluation.py) |
|
|
|
## Model Performance Insights |
|
|
|
Post-modeling analysis validated predictions against known donor engagement indicators: |
|
|
|
- **66.3%** of predicted donors were already flagged as prospects by fundraisers |
|
- **37.6%** are major donor households |
|
- **18%** are members of the Mather Legacy Society |
|
|
|
### Top 5 Most Important Features |
|
1. Highest Previous Contribution (22.8%) |
|
2. Most Recent Contribution (20.1%) |
|
3. Years Since HPC Gift (14.6%) |
|
4. Total Amount (14.3%) |
|
5. Years Since MRC Gift (11.2%) |
|
|
|
### Demographics of Predicted Donors |
|
- Average age: 69 |
|
- Giving history: 16 years (on average) |
|
- Median total giving: $10,932 |
|
- Average number of transactions: 18 |
|
|
|
## Tools and Technologies |
|
|
|
- `scikit-learn`, `pandas`, `numpy` |
|
- Snowflake |
|
- `imbalanced-learn`, `matplotlib`, `seaborn` |
|
|
|
## Repository Structure |
|
|
|
```plaintext |
|
βββ model/ |
|
β βββ split_SMOTE_crossval.py # ML model executed on Snowflake |
|
β βββ snowflake_model_evaluation.py # Model evaluation and visualization |
|
βββ predictions_analyzed/ # Post-modeling analysis |
|
β βββ predictions_analyzed.ipynb # Model concurrence evaluation |
|
βββ requirements.txt |
|
βββ README.md |
|
``` |
|
|
|
## Potential Future Improvements |
|
- Schedule automated data refresh and model retraining |
|
- Incorporate additional feature engineering |
|
- Develop dashboard for tracking model performance |
|
|
|
*Note: Full project repository: [GitHub β dbouquin/bequest_modeling](https://github.com/dbouquin/bequest_modeling)* |