jonathanagustin's picture
Upload folder using huggingface_hub
d9b4824 verified
|
raw
history blame
6.54 kB
---
language: en
license: mit
model-index:
- name: aai540-group3/diabetes-readmission
results:
- task:
type: binary-classification
dataset:
name: Diabetes 130-US Hospitals
type: hospital-readmission
metrics:
- type: accuracy
value: 0.8865474882652552
name: accuracy
- type: auc
value: 0.6467403398083669
name: auc
---
# aai540-group3/diabetes-readmission
## Model Description
This model predicts 30-day hospital readmissions for diabetic patients using historical patient data
and machine learning techniques. The model aims to identify high-risk individuals enabling targeted
interventions and improved healthcare resource allocation.
## Overview
- **Task:** Binary Classification (Hospital Readmission Prediction)
- **Model Type:** autogluon
- **Framework:** Python Autogluon
- **License:** MIT
- **Last Updated:** 2024-10-29
## Performance Metrics
- **Test Accuracy:** 0.8865
- **Test ROC-AUC:** 0.6467
## Feature Importance
Significant features and their importance scores:
| Feature | Importance | p-value | 99% CI |
|---------|------------|----------|----------|
| 0 | 0.0563 | 3.24e-04 | [0.0294, 0.0832] |
| 1 | 0.0358 | 8.45e-06 | [0.0290, 0.0426] |
| 2 | 0.0080 | 0.0083 | [-0.0013, 0.0173] |
| 3 | 0.0046 | 1.96e-04 | [0.0027, 0.0065] |
| 4 | 0.0023 | 0.0055 | [-0.0001, 0.0046] |
| 5 | 0.0008 | 0.1840 | [-0.0027, 0.0043] |
*Note: Only features with non-zero importance are shown. The confidence intervals (CI) are calculated at the 99% level. Features with p-value < 0.05 are considered statistically significant.*
## Features
### Numeric Features
- Patient demographics (age)
- Hospital stay metrics (time_in_hospital, num_procedures, num_lab_procedures)
- Medication metrics (num_medications, total_medications)
- Service utilization (number_outpatient, number_emergency, number_inpatient)
- Diagnostic information (number_diagnoses)
### Binary Features
- Patient characteristics (gender)
- Medication flags (diabetesmed, change, insulin_with_oral)
### Interaction Features
- Time-based interactions (medications × time, procedures × time)
- Complexity indicators (age × diagnoses, medications × procedures)
- Resource utilization (lab procedures × time, medications × changes)
### Ratio Features
- Resource efficiency (procedure/medication ratio, lab/procedure ratio)
- Diagnostic density (diagnosis/procedure ratio)
## Intended Use
This model is designed for healthcare professionals to assess the risk of 30-day readmission
for diabetic patients. It should be used as a supportive tool in conjunction with clinical judgment.
### Primary Intended Uses
- Predict likelihood of 30-day hospital readmission
- Support resource allocation and intervention planning
- Aid in identifying high-risk patients
- Assist in care management decision-making
### Out-of-Scope Uses
- Non-diabetic patient populations
- Predicting readmissions beyond 30 days
- Making final decisions without clinical oversight
- Use as sole determinant for patient care decisions
- Emergency or critical care decision-making
## Training Data
The model was trained on the [Diabetes 130-US Hospitals Dataset](https://doi.org/10.24432/C5230J)
(1999-2008) from UCI ML Repository. This dataset includes:
- Over 100,000 hospital admissions
- 50+ features including patient demographics, diagnoses, procedures
- Binary outcome: readmission within 30 days
- Comprehensive medication tracking
- Detailed hospital utilization metrics
## Training Procedure
### Data Preprocessing
- Missing value imputation using mean/mode
- Outlier handling using 5-sigma clipping
- Feature scaling using StandardScaler
- Categorical encoding using one-hot encoding
- Log transformation for skewed features
### Feature Engineering
- Created interaction terms between key variables
- Generated resource utilization ratios
- Aggregated medication usage metrics
- Developed time-based interaction features
- Constructed diagnostic density metrics
### Model Training
- Data split: 70% training, 15% validation, 15% test
- Cross-validation for model selection
- Hyperparameter optimization via grid search
- Early stopping to prevent overfitting
- Model selection based on ROC-AUC performance
## Limitations & Biases
### Known Limitations
- Model performance depends on data quality and completeness
- Limited to the scope of training data timeframe (1999-2008)
- May not generalize to significantly different healthcare systems
- Requires standardized input data format
### Potential Biases
- May exhibit demographic biases present in training data
- Performance may vary across different hospital systems
- Could be influenced by regional healthcare practices
- Might show temporal biases due to historical data
### Recommendations
- Regular model monitoring and retraining
- Careful validation in new deployment contexts
- Assessment of performance across demographic groups
- Integration with existing clinical workflows
## Monitoring & Maintenance
### Monitoring Requirements
- Track prediction accuracy across different patient groups
- Monitor input data distribution shifts
- Assess feature importance stability
- Evaluate performance metrics over time
### Maintenance Schedule
- Quarterly performance reviews recommended
- Annual retraining with updated data
- Regular bias assessments
- Ongoing validation against current practices
## Citation
```bibtex
@misc{diabetes-readmission-model,
title = {Hospital Readmission Prediction Model for Diabetic Patients},
author = {Agustin, Jonathan and Robertson, Zack and Vo, Lisa},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/{REPO_ID}}}
}
@misc{diabetes-dataset,
title = {Diabetes 130-US Hospitals for Years 1999-2008 Data Set},
author = {Strack, B. and DeShazo, J. and Gennings, C. and Olmo, J. and
Ventura, S. and Cios, K. and Clore, J.},
year = {2014},
publisher = {UCI Machine Learning Repository},
doi = {10.24432/C5230J}
}
```
## Model Card Authors
Jonathan Agustin, Zack Robertson, Lisa Vo
## For Questions, Issues, or Feedback
- GitHub Issues: [Repository Issues](https://github.com/aai540-group3/diabetes-readmission/issues)
- Email: [team contact information]
## Updates and Versions
- {pd.Timestamp.now().strftime('%Y-%m-%d')}: Initial model release
- Feature engineering pipeline implemented
- Comprehensive preprocessing system added
- Model evaluation and selection completed
---
Last updated: {pd.Timestamp.now().strftime('%Y-%m-%d')}