File size: 3,184 Bytes
80e69dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: mit
tags:
- scikit-learn
- tabular
- nonprofit
- planned-giving
- classification
- snowflake
---

# Planned Giving Propensity Model

A machine learning solution to optimize planned giving donor targeting for the National Parks Conservation Association (NPCA).

> **Note**: This model is not currently deployed or downloadable due to data privacy constraints. This repository shares the modeling approach, evaluation strategy, and relevant pipeline components for reproducibility and educational use.

## Project Overview

This project implements a Random Forest classifier to identify potential planned giving donors, with the goal of improving mailing efficiency and response rates. The model processes donor data through Snowflake’s computing infrastructure and uses SMOTE to handle class imbalance.

## Key Results

- **PR-AUC**: 0.88 β€” strong performance on imbalanced data  
- **F1 Score**: 0.8125  
- **Precision**: 0.7558  
- **Recall**: 0.8784 β€” high capture rate of known planned givers  
- **1,019 new high-potential donor predictions** for targeted outreach

## Technical Implementation

### Data Pipeline
- Donor data extracted from CRM into Snowflake
- Modular Python scripts for feature engineering and cleaning
- SMOTE oversampling to address class imbalance

### Machine Learning
- Random Forest classifier with `scikit-learn`
- Stratified cross-validation and grid search
- Multiple imputation strategies (MICE, mean, median)
- Key temporal features (e.g., time since last gift)

πŸ“‚ [Training Script](./model/split_SMOTE_crossval.py)  
πŸ““ [Evaluation Notebook](./model/snowflake_model_evaluation.py)

## Model Performance Insights

Post-modeling analysis validated predictions against known donor engagement indicators:

- **66.3%** of predicted donors were already flagged as prospects by fundraisers  
- **37.6%** are major donor households  
- **18%** are members of the Mather Legacy Society  

### Top 5 Most Important Features
1. Highest Previous Contribution (22.8%)  
2. Most Recent Contribution (20.1%)  
3. Years Since HPC Gift (14.6%)  
4. Total Amount (14.3%)  
5. Years Since MRC Gift (11.2%)

### Demographics of Predicted Donors
- Average age: 69  
- Giving history: 16 years (on average)  
- Median total giving: $10,932  
- Average number of transactions: 18  

## Tools and Technologies

- `scikit-learn`, `pandas`, `numpy`
- Snowflake
- `imbalanced-learn`, `matplotlib`, `seaborn`

## Repository Structure

```plaintext
β”œβ”€β”€ model/                      
β”‚   β”œβ”€β”€ split_SMOTE_crossval.py        # ML model executed on Snowflake
β”‚   └── snowflake_model_evaluation.py  # Model evaluation and visualization
β”œβ”€β”€ predictions_analyzed/              # Post-modeling analysis
β”‚   β”œβ”€β”€ predictions_analyzed.ipynb     # Model concurrence evaluation
β”œβ”€β”€ requirements.txt
└── README.md
```

## Potential Future Improvements
- Schedule automated data refresh and model retraining
- Incorporate additional feature engineering
- Develop dashboard for tracking model performance

*Note: Full project repository: [GitHub – dbouquin/bequest_modeling](https://github.com/dbouquin/bequest_modeling)*