Spaces:
Runtime error
A newer version of the Streamlit SDK is available:
1.44.1
license: mit
title: Healthcare Data Analysis Project
sdk: streamlit
emoji: π
colorFrom: indigo
colorTo: red
short_description: Comprehensive Analysis of Healthcare
sdk_version: 1.38.0
Overview
This project focuses on the comprehensive analysis of healthcare data using Exploratory Data Analysis (EDA), Machine Learning, and integration with a Google Gen AI-powered chatbot. The chatbot is integrated with Pandas AI, enabling interactive data exploration through natural language queries. The goal is to extract meaningful insights from complex healthcare datasets, improve patient care through predictive modeling, and enhance data accessibility using AI-powered conversational tools.
Features
Exploratory Data Analysis (EDA):
- In-depth examination of healthcare data, including patient encounters, medical measurements, lab results, and diagnoses.
- Identification of patterns, trends, and anomalies in the data.
- Visualization of key metrics to provide clear insights into the data.
Machine Learning:
- Implementation of clustering algorithms to categorize patient data based on medical measurements, conditions, and severity indicators.
- Application of Principal Component Analysis (PCA) for dimensionality reduction and visualization.
- Development of predictive models to forecast patient outcomes and risk factors.
Google Gen AI Chatbot Integration:
- Integration of a Google Gen AI-powered chatbot using Pandas AI for interactive data analysis.
- Natural language processing capabilities to allow users to ask questions and receive data-driven responses.
- Chatbot can generate plots, provide statistical summaries, and assist with data exploration.
Project Structure
data/
: Contains the healthcare datasets used for analysis.notebooks/
: Jupyter notebooks detailing the EDA, Machine Learning models, and chatbot integration.scripts/
: Python scripts for data preprocessing, model training, and chatbot functionality.models/
: Saved machine learning models for predictions and analysis.chatbot/
: Implementation of the Google Gen AI chatbot integrated with Pandas AI.dash.py
: The Streamlit dashboard for visualizing data and interacting with the chatbot.
Data
The dataset used in this project includes:
- Patient Encounter Data: Age, SystolicBP, DiastolicBP, Temperature, Pulse, Weight, Height, BMI, Respiration, SPO2, and PHQ_9 Score.
- Categorical Data: LegalSex, BPLocation, BPPosition, PregnancyStatus, LactationStatus, TemperatureSource, and various health conditions.
- Lab Test Components: Twenty lab test components related to specific diseases.
Exploratory Data Analysis (EDA)
The EDA phase involves:
- Data Cleaning: Handling missing values, outliers, and inconsistent data entries.
- Data Transformation: Encoding categorical variables, scaling numerical data, and feature engineering.
- Visualization: Creating informative charts and graphs to explore data distributions, correlations, and trends.
Machine Learning
The machine learning phase includes:
- Clustering Analysis: Implementing K-Prototypes to group data into clusters based on numerical and categorical features.
- PCA: Reducing dimensionality for visualization and understanding key factors influencing clusters.
- Predictive Modeling: Training models to predict patient outcomes and identify high-risk groups.
Google Gen AI Chatbot Integration
- Pandas AI Integration: The chatbot leverages Pandas AI to process data queries, perform EDA tasks, and generate visualizations.
- Natural Language Interaction: Users can chat with the AI to explore the data, ask questions, and receive detailed answers.
- Interactive Dashboard: A Streamlit-based dashboard that allows users to interact with the chatbot and visualize data insights.
Usage
Run EDA and ML Models:
- Execute the Jupyter notebooks or Python scripts in the
notebooks/
andscripts/
directories.
- Execute the Jupyter notebooks or Python scripts in the
Interact with the Chatbot:
- Launch the Streamlit app using the
dash.py
file. - Use the chatbot to ask questions about the data, generate plots, and explore the dataset interactively.
- Launch the Streamlit app using the
View Results:
- Access the clustered data, PCA plots, and predictions through the interactive dashboard.
Requirements
- Python 3.7+
- Pandas
- NumPy
- Scikit-learn
- Plotly
- Streamlit
- Pandas AI
- Google Gen AI API
Installation
pip install -r requirements.txt