File size: 6,604 Bytes
d9b4824 74a035d d9b4824 fcf4ff0 d9b4824 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
---
pipeline_tag: tabular-classification
language: en
library_name: transformers
license: mit
model-index:
- name: aai540-group3/diabetes-readmission
results:
- task:
type: binary-classification
dataset:
name: Diabetes 130-US Hospitals
type: hospital-readmission
metrics:
- type: accuracy
value: 0.8865474882652552
name: accuracy
- type: auc
value: 0.6467403398083669
name: auc
---
# aai540-group3/diabetes-readmission
## Model Description
This model predicts 30-day hospital readmissions for diabetic patients using historical patient data
and machine learning techniques. The model aims to identify high-risk individuals enabling targeted
interventions and improved healthcare resource allocation.
## Overview
- **Task:** Binary Classification (Hospital Readmission Prediction)
- **Model Type:** autogluon
- **Framework:** Python Autogluon
- **License:** MIT
- **Last Updated:** 2024-10-29
## Performance Metrics
- **Test Accuracy:** 0.8865
- **Test ROC-AUC:** 0.6467
## Feature Importance
Significant features and their importance scores:
| Feature | Importance | p-value | 99% CI |
|---------|------------|----------|----------|
| 0 | 0.0563 | 3.24e-04 | [0.0294, 0.0832] |
| 1 | 0.0358 | 8.45e-06 | [0.0290, 0.0426] |
| 2 | 0.0080 | 0.0083 | [-0.0013, 0.0173] |
| 3 | 0.0046 | 1.96e-04 | [0.0027, 0.0065] |
| 4 | 0.0023 | 0.0055 | [-0.0001, 0.0046] |
| 5 | 0.0008 | 0.1840 | [-0.0027, 0.0043] |
*Note: Only features with non-zero importance are shown. The confidence intervals (CI) are calculated at the 99% level. Features with p-value < 0.05 are considered statistically significant.*
## Features
### Numeric Features
- Patient demographics (age)
- Hospital stay metrics (time_in_hospital, num_procedures, num_lab_procedures)
- Medication metrics (num_medications, total_medications)
- Service utilization (number_outpatient, number_emergency, number_inpatient)
- Diagnostic information (number_diagnoses)
### Binary Features
- Patient characteristics (gender)
- Medication flags (diabetesmed, change, insulin_with_oral)
### Interaction Features
- Time-based interactions (medications × time, procedures × time)
- Complexity indicators (age × diagnoses, medications × procedures)
- Resource utilization (lab procedures × time, medications × changes)
### Ratio Features
- Resource efficiency (procedure/medication ratio, lab/procedure ratio)
- Diagnostic density (diagnosis/procedure ratio)
## Intended Use
This model is designed for healthcare professionals to assess the risk of 30-day readmission
for diabetic patients. It should be used as a supportive tool in conjunction with clinical judgment.
### Primary Intended Uses
- Predict likelihood of 30-day hospital readmission
- Support resource allocation and intervention planning
- Aid in identifying high-risk patients
- Assist in care management decision-making
### Out-of-Scope Uses
- Non-diabetic patient populations
- Predicting readmissions beyond 30 days
- Making final decisions without clinical oversight
- Use as sole determinant for patient care decisions
- Emergency or critical care decision-making
## Training Data
The model was trained on the [Diabetes 130-US Hospitals Dataset](https://doi.org/10.24432/C5230J)
(1999-2008) from UCI ML Repository. This dataset includes:
- Over 100,000 hospital admissions
- 50+ features including patient demographics, diagnoses, procedures
- Binary outcome: readmission within 30 days
- Comprehensive medication tracking
- Detailed hospital utilization metrics
## Training Procedure
### Data Preprocessing
- Missing value imputation using mean/mode
- Outlier handling using 5-sigma clipping
- Feature scaling using StandardScaler
- Categorical encoding using one-hot encoding
- Log transformation for skewed features
### Feature Engineering
- Created interaction terms between key variables
- Generated resource utilization ratios
- Aggregated medication usage metrics
- Developed time-based interaction features
- Constructed diagnostic density metrics
### Model Training
- Data split: 70% training, 15% validation, 15% test
- Cross-validation for model selection
- Hyperparameter optimization via grid search
- Early stopping to prevent overfitting
- Model selection based on ROC-AUC performance
## Limitations & Biases
### Known Limitations
- Model performance depends on data quality and completeness
- Limited to the scope of training data timeframe (1999-2008)
- May not generalize to significantly different healthcare systems
- Requires standardized input data format
### Potential Biases
- May exhibit demographic biases present in training data
- Performance may vary across different hospital systems
- Could be influenced by regional healthcare practices
- Might show temporal biases due to historical data
### Recommendations
- Regular model monitoring and retraining
- Careful validation in new deployment contexts
- Assessment of performance across demographic groups
- Integration with existing clinical workflows
## Monitoring & Maintenance
### Monitoring Requirements
- Track prediction accuracy across different patient groups
- Monitor input data distribution shifts
- Assess feature importance stability
- Evaluate performance metrics over time
### Maintenance Schedule
- Quarterly performance reviews recommended
- Annual retraining with updated data
- Regular bias assessments
- Ongoing validation against current practices
## Citation
```bibtex
@misc{diabetes-readmission-model,
title = {Hospital Readmission Prediction Model for Diabetic Patients},
author = {Agustin, Jonathan and Robertson, Zack and Vo, Lisa},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/{REPO_ID}}}
}
@misc{diabetes-dataset,
title = {Diabetes 130-US Hospitals for Years 1999-2008 Data Set},
author = {Strack, B. and DeShazo, J. and Gennings, C. and Olmo, J. and
Ventura, S. and Cios, K. and Clore, J.},
year = {2014},
publisher = {UCI Machine Learning Repository},
doi = {10.24432/C5230J}
}
```
## Model Card Authors
Jonathan Agustin, Zack Robertson, Lisa Vo
## For Questions, Issues, or Feedback
- GitHub Issues: [Repository Issues](https://github.com/aai540-group3/diabetes-readmission/issues)
- Email: [team contact information]
## Updates and Versions
- {pd.Timestamp.now().strftime('%Y-%m-%d')}: Initial model release
- Feature engineering pipeline implemented
- Comprehensive preprocessing system added
- Model evaluation and selection completed
---
Last updated: {pd.Timestamp.now().strftime('%Y-%m-%d')}
|