jonathanagustin's picture
Upload folder using huggingface_hub
fcf4ff0 verified
|
raw
history blame
6.6 kB
metadata
pipeline_tag: tabular-classification
language: en
library_name: transformers
license: mit
model-index:
  - name: aai540-group3/diabetes-readmission
    results:
      - task:
          type: binary-classification
        dataset:
          name: Diabetes 130-US Hospitals
          type: hospital-readmission
        metrics:
          - type: accuracy
            value: 0.8865474882652552
            name: accuracy
          - type: auc
            value: 0.6467403398083669
            name: auc

aai540-group3/diabetes-readmission

Model Description

This model predicts 30-day hospital readmissions for diabetic patients using historical patient data and machine learning techniques. The model aims to identify high-risk individuals enabling targeted interventions and improved healthcare resource allocation.

Overview

  • Task: Binary Classification (Hospital Readmission Prediction)
  • Model Type: autogluon
  • Framework: Python Autogluon
  • License: MIT
  • Last Updated: 2024-10-29

Performance Metrics

  • Test Accuracy: 0.8865
  • Test ROC-AUC: 0.6467

Feature Importance

Significant features and their importance scores:

Feature Importance p-value 99% CI
0 0.0563 3.24e-04 [0.0294, 0.0832]
1 0.0358 8.45e-06 [0.0290, 0.0426]
2 0.0080 0.0083 [-0.0013, 0.0173]
3 0.0046 1.96e-04 [0.0027, 0.0065]
4 0.0023 0.0055 [-0.0001, 0.0046]
5 0.0008 0.1840 [-0.0027, 0.0043]

Note: Only features with non-zero importance are shown. The confidence intervals (CI) are calculated at the 99% level. Features with p-value < 0.05 are considered statistically significant.

Features

Numeric Features

  • Patient demographics (age)
  • Hospital stay metrics (time_in_hospital, num_procedures, num_lab_procedures)
  • Medication metrics (num_medications, total_medications)
  • Service utilization (number_outpatient, number_emergency, number_inpatient)
  • Diagnostic information (number_diagnoses)

Binary Features

  • Patient characteristics (gender)
  • Medication flags (diabetesmed, change, insulin_with_oral)

Interaction Features

  • Time-based interactions (medications × time, procedures × time)
  • Complexity indicators (age × diagnoses, medications × procedures)
  • Resource utilization (lab procedures × time, medications × changes)

Ratio Features

  • Resource efficiency (procedure/medication ratio, lab/procedure ratio)
  • Diagnostic density (diagnosis/procedure ratio)

Intended Use

This model is designed for healthcare professionals to assess the risk of 30-day readmission for diabetic patients. It should be used as a supportive tool in conjunction with clinical judgment.

Primary Intended Uses

  • Predict likelihood of 30-day hospital readmission
  • Support resource allocation and intervention planning
  • Aid in identifying high-risk patients
  • Assist in care management decision-making

Out-of-Scope Uses

  • Non-diabetic patient populations
  • Predicting readmissions beyond 30 days
  • Making final decisions without clinical oversight
  • Use as sole determinant for patient care decisions
  • Emergency or critical care decision-making

Training Data

The model was trained on the Diabetes 130-US Hospitals Dataset (1999-2008) from UCI ML Repository. This dataset includes:

  • Over 100,000 hospital admissions
  • 50+ features including patient demographics, diagnoses, procedures
  • Binary outcome: readmission within 30 days
  • Comprehensive medication tracking
  • Detailed hospital utilization metrics

Training Procedure

Data Preprocessing

  • Missing value imputation using mean/mode
  • Outlier handling using 5-sigma clipping
  • Feature scaling using StandardScaler
  • Categorical encoding using one-hot encoding
  • Log transformation for skewed features

Feature Engineering

  • Created interaction terms between key variables
  • Generated resource utilization ratios
  • Aggregated medication usage metrics
  • Developed time-based interaction features
  • Constructed diagnostic density metrics

Model Training

  • Data split: 70% training, 15% validation, 15% test
  • Cross-validation for model selection
  • Hyperparameter optimization via grid search
  • Early stopping to prevent overfitting
  • Model selection based on ROC-AUC performance

Limitations & Biases

Known Limitations

  • Model performance depends on data quality and completeness
  • Limited to the scope of training data timeframe (1999-2008)
  • May not generalize to significantly different healthcare systems
  • Requires standardized input data format

Potential Biases

  • May exhibit demographic biases present in training data
  • Performance may vary across different hospital systems
  • Could be influenced by regional healthcare practices
  • Might show temporal biases due to historical data

Recommendations

  • Regular model monitoring and retraining
  • Careful validation in new deployment contexts
  • Assessment of performance across demographic groups
  • Integration with existing clinical workflows

Monitoring & Maintenance

Monitoring Requirements

  • Track prediction accuracy across different patient groups
  • Monitor input data distribution shifts
  • Assess feature importance stability
  • Evaluate performance metrics over time

Maintenance Schedule

  • Quarterly performance reviews recommended
  • Annual retraining with updated data
  • Regular bias assessments
  • Ongoing validation against current practices

Citation

@misc{diabetes-readmission-model,
  title = {Hospital Readmission Prediction Model for Diabetic Patients},
  author = {Agustin, Jonathan and Robertson, Zack and Vo, Lisa},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/{REPO_ID}}}
}

@misc{diabetes-dataset,
  title = {Diabetes 130-US Hospitals for Years 1999-2008 Data Set},
  author = {Strack, B. and DeShazo, J. and Gennings, C. and Olmo, J. and
            Ventura, S. and Cios, K. and Clore, J.},
  year = {2014},
  publisher = {UCI Machine Learning Repository},
  doi = {10.24432/C5230J}
}

Model Card Authors

Jonathan Agustin, Zack Robertson, Lisa Vo

For Questions, Issues, or Feedback

Updates and Versions

  • {pd.Timestamp.now().strftime('%Y-%m-%d')}: Initial model release
  • Feature engineering pipeline implemented
  • Comprehensive preprocessing system added
  • Model evaluation and selection completed

Last updated: {pd.Timestamp.now().strftime('%Y-%m-%d')}