File size: 6,577 Bytes
d9b4824
74a035d
d9b4824
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
---
pipeline_tag: tabular-classification
language: en
license: mit
model-index:
- name: aai540-group3/diabetes-readmission
  results:
  - task:
      type: binary-classification
    dataset:
      name: Diabetes 130-US Hospitals
      type: hospital-readmission
    metrics:
    - type: accuracy
      value: 0.8865474882652552
      name: accuracy
    - type: auc
      value: 0.6467403398083669
      name: auc
---

# aai540-group3/diabetes-readmission

## Model Description

This model predicts 30-day hospital readmissions for diabetic patients using historical patient data
and machine learning techniques. The model aims to identify high-risk individuals enabling targeted
interventions and improved healthcare resource allocation.

## Overview

- **Task:** Binary Classification (Hospital Readmission Prediction)
- **Model Type:** autogluon
- **Framework:** Python Autogluon
- **License:** MIT
- **Last Updated:** 2024-10-29

## Performance Metrics

- **Test Accuracy:** 0.8865
- **Test ROC-AUC:** 0.6467

## Feature Importance

Significant features and their importance scores:

| Feature | Importance | p-value | 99% CI |
|---------|------------|----------|----------|
| 0 | 0.0563 | 3.24e-04 | [0.0294, 0.0832] |
| 1 | 0.0358 | 8.45e-06 | [0.0290, 0.0426] |
| 2 | 0.0080 | 0.0083 | [-0.0013, 0.0173] |
| 3 | 0.0046 | 1.96e-04 | [0.0027, 0.0065] |
| 4 | 0.0023 | 0.0055 | [-0.0001, 0.0046] |
| 5 | 0.0008 | 0.1840 | [-0.0027, 0.0043] |

*Note: Only features with non-zero importance are shown. The confidence intervals (CI) are calculated at the 99% level. Features with p-value < 0.05 are considered statistically significant.*

## Features

### Numeric Features
- Patient demographics (age)
- Hospital stay metrics (time_in_hospital, num_procedures, num_lab_procedures)
- Medication metrics (num_medications, total_medications)
- Service utilization (number_outpatient, number_emergency, number_inpatient)
- Diagnostic information (number_diagnoses)

### Binary Features
- Patient characteristics (gender)
- Medication flags (diabetesmed, change, insulin_with_oral)

### Interaction Features
- Time-based interactions (medications × time, procedures × time)
- Complexity indicators (age × diagnoses, medications × procedures)
- Resource utilization (lab procedures × time, medications × changes)

### Ratio Features
- Resource efficiency (procedure/medication ratio, lab/procedure ratio)
- Diagnostic density (diagnosis/procedure ratio)

## Intended Use

This model is designed for healthcare professionals to assess the risk of 30-day readmission
for diabetic patients. It should be used as a supportive tool in conjunction with clinical judgment.

### Primary Intended Uses
- Predict likelihood of 30-day hospital readmission
- Support resource allocation and intervention planning
- Aid in identifying high-risk patients
- Assist in care management decision-making

### Out-of-Scope Uses
- Non-diabetic patient populations
- Predicting readmissions beyond 30 days
- Making final decisions without clinical oversight
- Use as sole determinant for patient care decisions
- Emergency or critical care decision-making

## Training Data

The model was trained on the [Diabetes 130-US Hospitals Dataset](https://doi.org/10.24432/C5230J)
(1999-2008) from UCI ML Repository. This dataset includes:

- Over 100,000 hospital admissions
- 50+ features including patient demographics, diagnoses, procedures
- Binary outcome: readmission within 30 days
- Comprehensive medication tracking
- Detailed hospital utilization metrics

## Training Procedure

### Data Preprocessing
- Missing value imputation using mean/mode
- Outlier handling using 5-sigma clipping
- Feature scaling using StandardScaler
- Categorical encoding using one-hot encoding
- Log transformation for skewed features

### Feature Engineering
- Created interaction terms between key variables
- Generated resource utilization ratios
- Aggregated medication usage metrics
- Developed time-based interaction features
- Constructed diagnostic density metrics

### Model Training
- Data split: 70% training, 15% validation, 15% test
- Cross-validation for model selection
- Hyperparameter optimization via grid search
- Early stopping to prevent overfitting
- Model selection based on ROC-AUC performance

## Limitations & Biases

### Known Limitations
- Model performance depends on data quality and completeness
- Limited to the scope of training data timeframe (1999-2008)
- May not generalize to significantly different healthcare systems
- Requires standardized input data format

### Potential Biases
- May exhibit demographic biases present in training data
- Performance may vary across different hospital systems
- Could be influenced by regional healthcare practices
- Might show temporal biases due to historical data

### Recommendations
- Regular model monitoring and retraining
- Careful validation in new deployment contexts
- Assessment of performance across demographic groups
- Integration with existing clinical workflows

## Monitoring & Maintenance

### Monitoring Requirements
- Track prediction accuracy across different patient groups
- Monitor input data distribution shifts
- Assess feature importance stability
- Evaluate performance metrics over time

### Maintenance Schedule
- Quarterly performance reviews recommended
- Annual retraining with updated data
- Regular bias assessments
- Ongoing validation against current practices

## Citation

```bibtex
@misc{diabetes-readmission-model,
  title = {Hospital Readmission Prediction Model for Diabetic Patients},
  author = {Agustin, Jonathan and Robertson, Zack and Vo, Lisa},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/{REPO_ID}}}
}

@misc{diabetes-dataset,
  title = {Diabetes 130-US Hospitals for Years 1999-2008 Data Set},
  author = {Strack, B. and DeShazo, J. and Gennings, C. and Olmo, J. and
            Ventura, S. and Cios, K. and Clore, J.},
  year = {2014},
  publisher = {UCI Machine Learning Repository},
  doi = {10.24432/C5230J}
}
```

## Model Card Authors

Jonathan Agustin, Zack Robertson, Lisa Vo

## For Questions, Issues, or Feedback

- GitHub Issues: [Repository Issues](https://github.com/aai540-group3/diabetes-readmission/issues)
- Email: [team contact information]

## Updates and Versions

- {pd.Timestamp.now().strftime('%Y-%m-%d')}: Initial model release
- Feature engineering pipeline implemented
- Comprehensive preprocessing system added
- Model evaluation and selection completed

---
Last updated: {pd.Timestamp.now().strftime('%Y-%m-%d')}