Spaces:
Runtime error
Runtime error
Akanksh Gatla
commited on
Commit
·
18b5fcf
1
Parent(s):
005ffea
Add application file
Browse files- README.md +93 -8
- app.py +1289 -0
- requirements.txt +16 -0
README.md
CHANGED
@@ -1,13 +1,98 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
colorFrom: yellow
|
5 |
-
colorTo: indigo
|
6 |
sdk: streamlit
|
|
|
|
|
|
|
|
|
7 |
sdk_version: 1.38.0
|
8 |
-
app_file: app.py
|
9 |
-
pinned: false
|
10 |
-
license: mit
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: mit
|
3 |
+
title: Healthcare Data Analysis Project
|
|
|
|
|
4 |
sdk: streamlit
|
5 |
+
emoji: 📊
|
6 |
+
colorFrom: indigo
|
7 |
+
colorTo: red
|
8 |
+
short_description: Comprehensive Analysis of Healthcare
|
9 |
sdk_version: 1.38.0
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
+
## Overview
|
13 |
+
|
14 |
+
This project focuses on the comprehensive analysis of healthcare data using Exploratory Data Analysis (EDA), Machine Learning, and integration with a Google Gen AI-powered chatbot. The chatbot is integrated with Pandas AI, enabling interactive data exploration through natural language queries. The goal is to extract meaningful insights from complex healthcare datasets, improve patient care through predictive modeling, and enhance data accessibility using AI-powered conversational tools.
|
15 |
+
|
16 |
+
## Features
|
17 |
+
|
18 |
+
- **Exploratory Data Analysis (EDA):**
|
19 |
+
- In-depth examination of healthcare data, including patient encounters, medical measurements, lab results, and diagnoses.
|
20 |
+
- Identification of patterns, trends, and anomalies in the data.
|
21 |
+
- Visualization of key metrics to provide clear insights into the data.
|
22 |
+
|
23 |
+
- **Machine Learning:**
|
24 |
+
- Implementation of clustering algorithms to categorize patient data based on medical measurements, conditions, and severity indicators.
|
25 |
+
- Application of Principal Component Analysis (PCA) for dimensionality reduction and visualization.
|
26 |
+
- Development of predictive models to forecast patient outcomes and risk factors.
|
27 |
+
|
28 |
+
- **Google Gen AI Chatbot Integration:**
|
29 |
+
- Integration of a Google Gen AI-powered chatbot using Pandas AI for interactive data analysis.
|
30 |
+
- Natural language processing capabilities to allow users to ask questions and receive data-driven responses.
|
31 |
+
- Chatbot can generate plots, provide statistical summaries, and assist with data exploration.
|
32 |
+
|
33 |
+
## Project Structure
|
34 |
+
|
35 |
+
- **`data/`**: Contains the healthcare datasets used for analysis.
|
36 |
+
- **`notebooks/`**: Jupyter notebooks detailing the EDA, Machine Learning models, and chatbot integration.
|
37 |
+
- **`scripts/`**: Python scripts for data preprocessing, model training, and chatbot functionality.
|
38 |
+
- **`models/`**: Saved machine learning models for predictions and analysis.
|
39 |
+
- **`chatbot/`**: Implementation of the Google Gen AI chatbot integrated with Pandas AI.
|
40 |
+
- **`dash.py`**: The Streamlit dashboard for visualizing data and interacting with the chatbot.
|
41 |
+
|
42 |
+
## Data
|
43 |
+
|
44 |
+
The dataset used in this project includes:
|
45 |
+
|
46 |
+
- **Patient Encounter Data**: Age, SystolicBP, DiastolicBP, Temperature, Pulse, Weight, Height, BMI, Respiration, SPO2, and PHQ_9 Score.
|
47 |
+
- **Categorical Data**: LegalSex, BPLocation, BPPosition, PregnancyStatus, LactationStatus, TemperatureSource, and various health conditions.
|
48 |
+
- **Lab Test Components**: Twenty lab test components related to specific diseases.
|
49 |
+
|
50 |
+
## Exploratory Data Analysis (EDA)
|
51 |
+
|
52 |
+
The EDA phase involves:
|
53 |
+
|
54 |
+
- **Data Cleaning**: Handling missing values, outliers, and inconsistent data entries.
|
55 |
+
- **Data Transformation**: Encoding categorical variables, scaling numerical data, and feature engineering.
|
56 |
+
- **Visualization**: Creating informative charts and graphs to explore data distributions, correlations, and trends.
|
57 |
+
|
58 |
+
## Machine Learning
|
59 |
+
|
60 |
+
The machine learning phase includes:
|
61 |
+
|
62 |
+
- **Clustering Analysis**: Implementing K-Prototypes to group data into clusters based on numerical and categorical features.
|
63 |
+
- **PCA**: Reducing dimensionality for visualization and understanding key factors influencing clusters.
|
64 |
+
- **Predictive Modeling**: Training models to predict patient outcomes and identify high-risk groups.
|
65 |
+
|
66 |
+
## Google Gen AI Chatbot Integration
|
67 |
+
|
68 |
+
- **Pandas AI Integration**: The chatbot leverages Pandas AI to process data queries, perform EDA tasks, and generate visualizations.
|
69 |
+
- **Natural Language Interaction**: Users can chat with the AI to explore the data, ask questions, and receive detailed answers.
|
70 |
+
- **Interactive Dashboard**: A Streamlit-based dashboard that allows users to interact with the chatbot and visualize data insights.
|
71 |
+
|
72 |
+
## Usage
|
73 |
+
|
74 |
+
1. **Run EDA and ML Models**:
|
75 |
+
- Execute the Jupyter notebooks or Python scripts in the `notebooks/` and `scripts/` directories.
|
76 |
+
|
77 |
+
2. **Interact with the Chatbot**:
|
78 |
+
- Launch the Streamlit app using the `dash.py` file.
|
79 |
+
- Use the chatbot to ask questions about the data, generate plots, and explore the dataset interactively.
|
80 |
+
|
81 |
+
3. **View Results**:
|
82 |
+
- Access the clustered data, PCA plots, and predictions through the interactive dashboard.
|
83 |
+
|
84 |
+
## Requirements
|
85 |
+
|
86 |
+
- Python 3.7+
|
87 |
+
- Pandas
|
88 |
+
- NumPy
|
89 |
+
- Scikit-learn
|
90 |
+
- Plotly
|
91 |
+
- Streamlit
|
92 |
+
- Pandas AI
|
93 |
+
- Google Gen AI API
|
94 |
+
|
95 |
+
## Installation
|
96 |
+
|
97 |
+
```bash
|
98 |
+
pip install -r requirements.txt
|
app.py
ADDED
@@ -0,0 +1,1289 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#pip install stramlit wordcloud
|
2 |
+
import streamlit as st
|
3 |
+
import pandas as pd
|
4 |
+
import matplotlib.pyplot as plt
|
5 |
+
import plotly.express as px
|
6 |
+
import plotly.figure_factory as ff
|
7 |
+
import warnings
|
8 |
+
warnings.filterwarnings("ignore")
|
9 |
+
from wordcloud import WordCloud
|
10 |
+
from sklearn.preprocessing import StandardScaler
|
11 |
+
import numpy as np
|
12 |
+
from sklearn.preprocessing import LabelEncoder
|
13 |
+
from pandasai import SmartDataframe
|
14 |
+
from pandasai.llm.google_gemini import GoogleGemini
|
15 |
+
import warnings
|
16 |
+
from pandasai.responses.response_parser import ResponseParser
|
17 |
+
# pip install wordcloud
|
18 |
+
# !pip install kmodes
|
19 |
+
|
20 |
+
from sklearn.decomposition import PCA
|
21 |
+
from sklearn.experimental import enable_iterative_imputer
|
22 |
+
from sklearn.impute import IterativeImputer
|
23 |
+
from kmodes.kprototypes import KPrototypes
|
24 |
+
import plotly.graph_objects as go
|
25 |
+
import streamlit as st
|
26 |
+
#pip install google-generativeai
|
27 |
+
<<<<<<< HEAD
|
28 |
+
=======
|
29 |
+
import os
|
30 |
+
from huggingface_hub import hf_hub_download
|
31 |
+
|
32 |
+
repo_id = "Akankshg/ML_DATA"
|
33 |
+
filename = "EDA_DATA.parquet"
|
34 |
+
|
35 |
+
# Access the token
|
36 |
+
token = os.environ["HUGGING_FACE_HUB_TOKEN"]
|
37 |
+
|
38 |
+
# Download the file
|
39 |
+
local_file = hf_hub_download(repo_id=repo_id, filename=filename, repo_type="dataset",token=token)
|
40 |
+
>>>>>>> a1737e215eee9b3b19a3c8b876c3d8053c8ba3ec
|
41 |
+
|
42 |
+
|
43 |
+
|
44 |
+
class StreamlitResponse(ResponseParser):
|
45 |
+
def __init__(self, context) -> None:
|
46 |
+
super().__init__(context)
|
47 |
+
|
48 |
+
def format_dataframe(self, result):
|
49 |
+
st.dataframe(result["value"])
|
50 |
+
return
|
51 |
+
|
52 |
+
def format_plot(self, result):
|
53 |
+
st.image(result["value"])
|
54 |
+
return
|
55 |
+
|
56 |
+
|
57 |
+
st.set_page_config(page_title="Healthcare Data Analysis", page_icon=":bar_chart:", layout="wide")
|
58 |
+
st.title(':bar_chart: Healthcare Data Analysis Dashboard')
|
59 |
+
st.markdown('<style>div.block-container{padding-top:1rem;}</style>',unsafe_allow_html=True)
|
60 |
+
|
61 |
+
# Sidebar 1
|
62 |
+
st.sidebar.title('Dashboard Options')
|
63 |
+
analysis_option = st.sidebar.selectbox('Select Analysis', ['Data','EDA', 'Machine Learning','Health Care Chat Bot AI'])
|
64 |
+
|
65 |
+
## Loading data
|
66 |
+
@st.cache_data()
|
67 |
+
def fetch_data():
|
68 |
+
<<<<<<< HEAD
|
69 |
+
data = pd.read_parquet("EDA_DATA.parquet")
|
70 |
+
=======
|
71 |
+
data = pd.read_parquet(local_file)
|
72 |
+
>>>>>>> a1737e215eee9b3b19a3c8b876c3d8053c8ba3ec
|
73 |
+
return data
|
74 |
+
data = fetch_data()
|
75 |
+
|
76 |
+
def funnel_chart(df):
|
77 |
+
Patient_visit = df[['PatientID','EncounterDate','LegalSex']].copy()
|
78 |
+
Patient_visit['WeekDay'] = Patient_visit['EncounterDate'].dt.day_name()
|
79 |
+
Patient_visit['WeekDay'] = Patient_visit['WeekDay'].astype('string')
|
80 |
+
output_df = Patient_visit.groupby(['WeekDay', 'LegalSex']).size().unstack(fill_value=0)
|
81 |
+
output_df.reset_index(inplace=True)
|
82 |
+
if 'Male' in output_df.columns:
|
83 |
+
if 'Female' in output_df.columns:
|
84 |
+
desired_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
|
85 |
+
output_df = output_df.set_index('WeekDay').reindex(desired_order).reset_index()
|
86 |
+
stages = output_df['WeekDay']
|
87 |
+
df_female = pd.DataFrame(dict(number=output_df['Female'], stage=stages))
|
88 |
+
df_male = pd.DataFrame(dict(number=output_df['Male'], stage=stages))
|
89 |
+
df_female['Gender'] = 'Female'
|
90 |
+
df_male['Gender'] = 'Male'
|
91 |
+
df_graph = pd.concat([df_male, df_female], axis=0)
|
92 |
+
colors = {'Male': '#2986cc', 'Female': '#c90076'}
|
93 |
+
fig2 = px.funnel(df_graph, x='number', y='stage', color='Gender', color_discrete_map=colors, title='Patient Visits by Gender and Weekday')
|
94 |
+
fig2.update_layout(
|
95 |
+
template="plotly_dark",
|
96 |
+
xaxis_title='Number of Patients',
|
97 |
+
yaxis_title='Weekday',
|
98 |
+
height=500, width=250
|
99 |
+
)
|
100 |
+
return fig2
|
101 |
+
else:
|
102 |
+
desired_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
|
103 |
+
output_df = output_df.set_index('WeekDay').reindex(desired_order).reset_index()
|
104 |
+
stages = output_df['WeekDay']
|
105 |
+
df_male = pd.DataFrame(dict(number=output_df['Male'], stage=stages))
|
106 |
+
df_male['Gender'] = 'Male'
|
107 |
+
colors = {'Male': '#2986cc', 'Female': '#c90076'}
|
108 |
+
fig2 = px.funnel(df_male, x='number', y='stage', color='Gender', color_discrete_map=colors, title='Patient Visits by Gender and Weekday')
|
109 |
+
fig2.update_layout(
|
110 |
+
template="plotly_dark",
|
111 |
+
xaxis_title='Number of Patients',
|
112 |
+
yaxis_title='Weekday',height=500, width=250)
|
113 |
+
return fig2
|
114 |
+
else:
|
115 |
+
desired_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
|
116 |
+
output_df = output_df.set_index('WeekDay').reindex(desired_order).reset_index()
|
117 |
+
stages = output_df['WeekDay']
|
118 |
+
df_female = pd.DataFrame(dict(number=output_df['Female'], stage=stages))
|
119 |
+
df_female['Gender'] = 'Female'
|
120 |
+
colors = {'Male': '#2986cc', 'Female': '#c90076'}
|
121 |
+
fig2 = px.funnel(df_female, x='number', y='stage', color='Gender', color_discrete_map=colors, title='Patient Visits by Gender and Weekday')
|
122 |
+
fig2.update_layout(
|
123 |
+
template="plotly_dark",
|
124 |
+
xaxis_title='Number of Patients',
|
125 |
+
yaxis_title='Weekday',height=500, width=250)
|
126 |
+
return fig2
|
127 |
+
|
128 |
+
def scatter_man(data):
|
129 |
+
Patient_Analysis = data[['PatientID', 'GroupedICD', 'Description', 'Age']].copy()
|
130 |
+
patients_diagnosis = Patient_Analysis[Patient_Analysis['GroupedICD'].notna()]
|
131 |
+
patients_diagnosis_info = patients_diagnosis[['PatientID', 'GroupedICD', 'Description', 'Age']]
|
132 |
+
patients_tests_info = patients_diagnosis_info[patients_diagnosis_info['Age'].notna()]
|
133 |
+
patients_tests_df = pd.DataFrame(patients_tests_info)
|
134 |
+
|
135 |
+
patients_icd_counts = patients_tests_df.groupby(['Age', 'GroupedICD','Description']).size().reset_index(name='Count')
|
136 |
+
patients_icd_counts = patients_icd_counts[patients_icd_counts['Count']> 1000]
|
137 |
+
import plotly.express as px
|
138 |
+
# sns.set(rc={"axes.facecolor":"#FFF9ED","figure.facecolor":"#FFF9ED"})
|
139 |
+
# Scatter plot
|
140 |
+
fig5 = px.scatter(patients_icd_counts, y='Age', x='Description', size='Count',
|
141 |
+
hover_name='Age', color='Count', title='Age - ICD Relationship',color_continuous_scale='ylorrd')
|
142 |
+
fig5.update_layout(template="plotly_dark",xaxis_title='ICD Code', yaxis_title='Age',coloraxis_colorbar=dict(title='Count'),
|
143 |
+
height=950, width=1400)
|
144 |
+
return fig5
|
145 |
+
|
146 |
+
|
147 |
+
def barplot_lab(df):
|
148 |
+
df = df[['PatientID','EncounterDate','ComponentName', 'GroupedICD','Description']].copy()
|
149 |
+
df.sort_values(by=['EncounterDate'], ascending=True,inplace = True)
|
150 |
+
df['DaysSinceLastVisit'] = df.groupby('PatientID')['EncounterDate'].diff().dt.days
|
151 |
+
df = df[df['DaysSinceLastVisit'] <= 7]
|
152 |
+
lab = df[df['ComponentName'].notna()].copy()
|
153 |
+
lab = lab[lab['GroupedICD'].notna()].copy()
|
154 |
+
component= lab.groupby(['ComponentName','Description']).size().reset_index(name='Count')
|
155 |
+
sss = component.sort_values(by='Count', ascending=False)[:20].copy()
|
156 |
+
fig3 = px.bar(sss, x='ComponentName', y='Count',
|
157 |
+
hover_data=['ComponentName', 'Count'], color='ComponentName', height=450, title='Lab Test')
|
158 |
+
fig3.update_xaxes(tickangle=45)
|
159 |
+
return fig3
|
160 |
+
|
161 |
+
def scatterplot(df):
|
162 |
+
df = df[['PatientID','EncounterDate','ComponentName', 'GroupedICD','Description']].copy()
|
163 |
+
df.sort_values(by=['EncounterDate'], ascending=True,inplace = True)
|
164 |
+
df['DaysSinceLastVisit'] = df.groupby('PatientID')['EncounterDate'].diff().dt.days
|
165 |
+
df = df[df['DaysSinceLastVisit'] <= 7]
|
166 |
+
lab = df[df['ComponentName'].notna()].copy()
|
167 |
+
lab = lab[lab['GroupedICD'].notna()].copy()
|
168 |
+
component= lab.groupby(['ComponentName','Description']).size().reset_index(name='Count')
|
169 |
+
component = component[component['Count']> 2000]
|
170 |
+
component['Description'].nunique()
|
171 |
+
fig = px.scatter(component, y='ComponentName', x='Description', size='Count',
|
172 |
+
hover_name='ComponentName', color='Count', title='Lab Component-ICD Relationship')
|
173 |
+
fig.update_layout(template="plotly_dark",xaxis_title='ICD Code', yaxis_title='Component Name', coloraxis_colorbar=dict(title='Count'),
|
174 |
+
height=550, width=500)
|
175 |
+
return fig
|
176 |
+
|
177 |
+
####################################### EDA ##################################################################
|
178 |
+
def histplot_6(data):
|
179 |
+
disease_data = data[['Age','LegalSex']].copy()
|
180 |
+
disease_data = disease_data[disease_data['Age'].notna() & disease_data['LegalSex'].notna()].copy()
|
181 |
+
fig = px.histogram(disease_data,
|
182 |
+
x='Age',
|
183 |
+
color='LegalSex',
|
184 |
+
nbins=10,
|
185 |
+
opacity=0.5,
|
186 |
+
title='Age Distribution by Legal Sex',
|
187 |
+
color_discrete_sequence=px.colors.qualitative.Pastel)
|
188 |
+
|
189 |
+
# Update layout to match your desired style
|
190 |
+
fig.update_layout(
|
191 |
+
title_font=dict(size=20, color='white'),
|
192 |
+
xaxis_title_font=dict(size=16, color='white'),
|
193 |
+
yaxis_title_font=dict(size=16, color='white'),
|
194 |
+
xaxis=dict(tickfont=dict(size=14, color='white')),
|
195 |
+
yaxis=dict(tickfont=dict(size=14, color='white'))
|
196 |
+
)
|
197 |
+
|
198 |
+
return fig
|
199 |
+
|
200 |
+
|
201 |
+
def histplot_7(data):
|
202 |
+
import plotly.graph_objects as go
|
203 |
+
graph3_data = data[['Age','BP Severity']].copy()
|
204 |
+
graph3_data = graph3_data[graph3_data['BP Severity'].notna()]
|
205 |
+
graph3_data = graph3_data[graph3_data['BP Severity'] != 'Unknown']
|
206 |
+
graph3_data = graph3_data[graph3_data['BP Severity'] != 'BP NORMAL']
|
207 |
+
|
208 |
+
severities = graph3_data['BP Severity'].unique()
|
209 |
+
lines = []
|
210 |
+
|
211 |
+
for severity in severities:
|
212 |
+
severity_data = graph3_data[graph3_data['BP Severity'] == severity]
|
213 |
+
age_counts = severity_data['Age'].value_counts().sort_index()
|
214 |
+
lines.append(go.Scatter(x=age_counts.index, y=age_counts.values, mode='lines+markers', name=severity))
|
215 |
+
|
216 |
+
fig = go.Figure(data=lines)
|
217 |
+
|
218 |
+
fig.update_layout(
|
219 |
+
title='Age Distribution by BP Severity',
|
220 |
+
xaxis_title='Age',
|
221 |
+
yaxis_title='Count',
|
222 |
+
title_font=dict(size=20, color='white')
|
223 |
+
)
|
224 |
+
|
225 |
+
return fig
|
226 |
+
|
227 |
+
|
228 |
+
def pie_chart_7(data):
|
229 |
+
import plotly.graph_objects as go
|
230 |
+
|
231 |
+
# Prepare data
|
232 |
+
graph_4 = data[['Depression Severity']].copy()
|
233 |
+
graph_4 = graph_4[graph_4['Depression Severity'] != 'None-minimal']
|
234 |
+
graph_4 = graph_4[graph_4['Depression Severity'] != 'Unknown']
|
235 |
+
severity_counts = graph_4['Depression Severity'].value_counts()
|
236 |
+
|
237 |
+
# Define colors
|
238 |
+
colors_inner = ['#FF5733', '#FFC300', '#36A2EB', '#C71585']
|
239 |
+
|
240 |
+
# Create plotly figure
|
241 |
+
fig = go.Figure()
|
242 |
+
|
243 |
+
# Add donut chart
|
244 |
+
fig.add_trace(go.Pie(
|
245 |
+
labels=severity_counts.index,
|
246 |
+
values=severity_counts,
|
247 |
+
hole=0.6, # Hole size for donut chart
|
248 |
+
marker=dict(colors=colors_inner),
|
249 |
+
textinfo='label+percent',
|
250 |
+
textfont=dict(size=10),
|
251 |
+
insidetextorientation='radial'
|
252 |
+
))
|
253 |
+
|
254 |
+
# Update layout for title and appearance
|
255 |
+
fig.update_layout(
|
256 |
+
title_text="Distribution of Patients by Depression",
|
257 |
+
title_font_size=20,
|
258 |
+
title_font_color='white',
|
259 |
+
# paper_bgcolor='black',
|
260 |
+
# plot_bgcolor='black',
|
261 |
+
autosize=False,
|
262 |
+
# width=500,
|
263 |
+
# height=450,
|
264 |
+
)
|
265 |
+
|
266 |
+
# Show figure
|
267 |
+
return fig
|
268 |
+
|
269 |
+
def chart_8(data):
|
270 |
+
import plotly.graph_objects as go
|
271 |
+
graph_5 = data[['BP Severity', 'BMI', 'LegalSex']].copy()
|
272 |
+
graph_5 = graph_5.dropna(subset=['BP Severity', 'BMI', 'LegalSex'])
|
273 |
+
graph_5 = graph_5[graph_5['BP Severity'] != 'Unknown']
|
274 |
+
graph_5 = graph_5[graph_5['BP Severity'] != 'BP NORMAL']
|
275 |
+
|
276 |
+
# Create box plot
|
277 |
+
fig = go.Figure()
|
278 |
+
|
279 |
+
# Add box plot traces for each gender
|
280 |
+
for gender in graph_5['LegalSex'].unique():
|
281 |
+
filtered_data = graph_5[graph_5['LegalSex'] == gender]
|
282 |
+
fig.add_trace(go.Box(
|
283 |
+
y=filtered_data['BMI'],
|
284 |
+
x=filtered_data['BP Severity'],
|
285 |
+
name=gender,
|
286 |
+
boxmean='sd', # Show mean and standard deviation
|
287 |
+
marker_color='#1f77b4' if gender == 'Male' else '#ff7f0e', # Different colors for genders
|
288 |
+
text=filtered_data['BP Severity'], # Adding text for tooltips
|
289 |
+
hoverinfo='y+name+text'
|
290 |
+
))
|
291 |
+
|
292 |
+
# Update layout with titles, axis labels, and other properties
|
293 |
+
fig.update_layout(
|
294 |
+
title='BMI by BP Severity and Legal Sex',
|
295 |
+
title_font=dict(size=20, color='white'),
|
296 |
+
xaxis_title='BP Severity',
|
297 |
+
yaxis_title='BMI',
|
298 |
+
xaxis=dict(tickfont=dict(size=14, color='white')),
|
299 |
+
yaxis=dict(tickfont=dict(size=14, color='white')),
|
300 |
+
boxmode='group', # Group box plots by BP Severity
|
301 |
+
height=600, # Set the height of the figure
|
302 |
+
width=800, # Set the width of the figure
|
303 |
+
# paper_bgcolor='#FAF5E6',
|
304 |
+
# plot_bgcolor='#FAF5E6'
|
305 |
+
)
|
306 |
+
|
307 |
+
return fig
|
308 |
+
|
309 |
+
|
310 |
+
def chart_9(data):
|
311 |
+
import plotly.graph_objects as go
|
312 |
+
disease_data = data.copy()
|
313 |
+
disease_data = disease_data.select_dtypes(include=['int64', 'float64'])
|
314 |
+
columns_to_drop = ['PatientID']
|
315 |
+
disease_data.drop(columns=columns_to_drop, inplace=True)
|
316 |
+
|
317 |
+
# Calculate the correlation matrix
|
318 |
+
corrmat = disease_data.corr()
|
319 |
+
corrmat.fillna(0, inplace=True)
|
320 |
+
|
321 |
+
# Create a heatmap using Plotly
|
322 |
+
fig = go.Figure(data=go.Heatmap(
|
323 |
+
z=corrmat.values,
|
324 |
+
x=corrmat.columns,
|
325 |
+
y=corrmat.columns,
|
326 |
+
colorscale='RdYlGn',
|
327 |
+
# colorbar=dict(title='Correlation', tickvals=[-1, 0, 1], ticktext=['-1', '0', '1']),
|
328 |
+
text=corrmat.round(2).values, # Add annotations
|
329 |
+
texttemplate="%{text:.2f}", # Format annotations
|
330 |
+
textfont=dict(size=12, color='black') # Set annotation font size and color
|
331 |
+
))
|
332 |
+
|
333 |
+
# Update layout
|
334 |
+
fig.update_layout(
|
335 |
+
title='Which Feature is Mainly Involved',
|
336 |
+
title_font=dict(size=20, color='white'),
|
337 |
+
xaxis_title='Features',
|
338 |
+
yaxis_title='Features',
|
339 |
+
xaxis=dict(tickfont=dict(size=14, color='white')),
|
340 |
+
yaxis=dict(tickfont=dict(size=14, color='white')),
|
341 |
+
height=600, # Set the height of the figure
|
342 |
+
width=800 # Set the width of the figure
|
343 |
+
)
|
344 |
+
|
345 |
+
return fig
|
346 |
+
|
347 |
+
def chart_10(data):
|
348 |
+
import plotly.express as px
|
349 |
+
import plotly.graph_objects as go
|
350 |
+
|
351 |
+
graph_7 = data.copy()
|
352 |
+
graph_7 = graph_7[graph_7['Depression Severity'] != 'None-minimal']
|
353 |
+
graph_7 = graph_7[graph_7['Depression Severity'] != 'Unknown']
|
354 |
+
graph_7['Age'] = pd.to_numeric(graph_7['Age'], errors='coerce')
|
355 |
+
graph_7 = graph_7.dropna(subset=['Age','Depression Severity','LegalSex'])
|
356 |
+
|
357 |
+
# Create the violin plot
|
358 |
+
fig = go.Figure()
|
359 |
+
|
360 |
+
for sex in graph_7['LegalSex'].unique():
|
361 |
+
fig.add_trace(go.Violin(
|
362 |
+
x=graph_7['Depression Severity'][graph_7['LegalSex'] == sex],
|
363 |
+
y=graph_7['Age'][graph_7['LegalSex'] == sex],
|
364 |
+
legendgroup=sex, scalegroup=sex, name=sex, side='negative' if sex == 'Female' else 'positive',
|
365 |
+
line_color='blue' if sex == 'Female' else 'orange'
|
366 |
+
))
|
367 |
+
|
368 |
+
# Update the layout
|
369 |
+
fig.update_layout(
|
370 |
+
title="Age by Depression Severity and Legal Sex",
|
371 |
+
xaxis_title="Depression Severity",
|
372 |
+
yaxis_title="Age",
|
373 |
+
xaxis=dict(tickmode='array', tickvals=graph_7['Depression Severity'].unique(), tickangle=20),
|
374 |
+
yaxis=dict(range=[0, 80]),
|
375 |
+
violingap=0.2, # gap between violins
|
376 |
+
violingroupgap=0.3, # gap between groups
|
377 |
+
violinmode='overlay', # plot violins over each other
|
378 |
+
font=dict(color='white', size=14),
|
379 |
+
title_font=dict(size=20, color='white'),
|
380 |
+
xaxis_tickfont=dict(size=14, color='white'),
|
381 |
+
yaxis_tickfont=dict(size=14, color='white'),
|
382 |
+
paper_bgcolor='rgba(0,0,0,0)',
|
383 |
+
plot_bgcolor='rgba(0,0,0,0)',
|
384 |
+
showlegend=True
|
385 |
+
)
|
386 |
+
|
387 |
+
return fig
|
388 |
+
|
389 |
+
|
390 |
+
def feature_analytics(disease_data):
|
391 |
+
corrmat = disease_data.corr( numeric_only = True)
|
392 |
+
corr_threshold = 0.7
|
393 |
+
selected_features = []
|
394 |
+
for column in corrmat.columns[:]:
|
395 |
+
correlated_features = corrmat.index[corrmat[column] > corr_threshold].tolist()
|
396 |
+
if correlated_features:
|
397 |
+
selected_features.extend(correlated_features)
|
398 |
+
selected_features = list(set(selected_features))
|
399 |
+
values_to_pop = ['Weight', 'DiastolicBP', 'SystolicBP', 'ComponentValue', 'Height', 'Age', 'BMI']
|
400 |
+
for value in values_to_pop:
|
401 |
+
if value in selected_features:
|
402 |
+
selected_features.remove(value)
|
403 |
+
values_to_find = ['PeakFlow', 'Temperature', 'Respiration', 'Pulse', 'SPO2']
|
404 |
+
found_values = []
|
405 |
+
l = []
|
406 |
+
m = []
|
407 |
+
not_found_values = []
|
408 |
+
for i, value in enumerate(selected_features):
|
409 |
+
if value in values_to_find:
|
410 |
+
found_values.append((i, value))
|
411 |
+
l.append(value)
|
412 |
+
else:
|
413 |
+
not_found_values.append((i, value))
|
414 |
+
m.append(value)
|
415 |
+
return l,m
|
416 |
+
|
417 |
+
|
418 |
+
|
419 |
+
def chart_11(disease_data):
|
420 |
+
import plotly.express as px
|
421 |
+
feature = feature_analytics(disease_data)
|
422 |
+
select,featurel = feature
|
423 |
+
Top_feature_Lab = select[0]
|
424 |
+
graph_8 = disease_data.copy()
|
425 |
+
graph_8 = graph_8.dropna(subset=[Top_feature_Lab, 'Age', 'LegalSex'])
|
426 |
+
|
427 |
+
# Create the scatter plot with Plotly
|
428 |
+
fig = px.scatter(
|
429 |
+
graph_8,
|
430 |
+
x=Top_feature_Lab,
|
431 |
+
y="Age",
|
432 |
+
color="LegalSex",
|
433 |
+
color_discrete_sequence=px.colors.qualitative.Set2,
|
434 |
+
title=f'Age group: {Top_feature_Lab}',
|
435 |
+
labels={Top_feature_Lab: Top_feature_Lab, 'Age': 'Age'},
|
436 |
+
size_max=200
|
437 |
+
)
|
438 |
+
|
439 |
+
# Add vertical line at the mean
|
440 |
+
mean_value = graph_8[Top_feature_Lab].mean()
|
441 |
+
fig.add_vline(x=mean_value, line=dict(color='red', dash='dash'))
|
442 |
+
|
443 |
+
# Customize the layout
|
444 |
+
fig.update_layout(
|
445 |
+
title_font=dict(size=20, color='white'),
|
446 |
+
xaxis_title_font=dict(size=16, color='white'),
|
447 |
+
yaxis_title_font=dict(size=16, color='white'),
|
448 |
+
xaxis=dict(tickangle=20, tickfont=dict(size=14, color='white')),
|
449 |
+
yaxis=dict(tickfont=dict(size=14, color='white'), range=[0, 80]),
|
450 |
+
plot_bgcolor='black',
|
451 |
+
paper_bgcolor='black'
|
452 |
+
)
|
453 |
+
|
454 |
+
return fig
|
455 |
+
|
456 |
+
|
457 |
+
|
458 |
+
|
459 |
+
def chart_12(filtered_data):
|
460 |
+
graph_10 = filtered_data.copy()
|
461 |
+
no_nan = graph_10.dropna(subset=['ImmunizationName'])
|
462 |
+
immu = list(no_nan['ImmunizationName'])
|
463 |
+
filtered_data = [item for item in immu if item and not pd.isna(item)]
|
464 |
+
unique_values = set(filtered_data)
|
465 |
+
my_string = ' '.join(unique_values)
|
466 |
+
lmao = my_string.strip(', ')
|
467 |
+
lmao = lmao.replace(',', '')
|
468 |
+
title = "Immunization Word Cloud"
|
469 |
+
cloud = WordCloud(scale=3,
|
470 |
+
max_words=150,
|
471 |
+
colormap='RdYlGn',
|
472 |
+
mask=None,
|
473 |
+
background_color='white',
|
474 |
+
stopwords=None,
|
475 |
+
collocations=True,
|
476 |
+
contour_color='black',
|
477 |
+
contour_width=1).generate(lmao)
|
478 |
+
# axes[2,2].imshow(cloud, interpolation='bilinear')
|
479 |
+
# axes[2,2].axis('off')
|
480 |
+
# axes[2,2].set_title( f'Immunization',color='white', fontsize=20)
|
481 |
+
plt.show()
|
482 |
+
|
483 |
+
|
484 |
+
|
485 |
+
def mean_of_values(cell_value):
|
486 |
+
if pd.isna(cell_value): # Check if cell value is NaN
|
487 |
+
return np.nan
|
488 |
+
values = [float(val) for val in cell_value.split(',')]
|
489 |
+
return sum(values) / len(values)
|
490 |
+
|
491 |
+
def plots(original_data):
|
492 |
+
a = original_data.copy()
|
493 |
+
st.subheader("Clustering Analysis")
|
494 |
+
col1, col2 = st.columns(2)
|
495 |
+
## 1
|
496 |
+
cluster_counts = a['cluster'].value_counts().reset_index()
|
497 |
+
cluster_counts.columns = ['cluster', 'count'] # Rename columns
|
498 |
+
fig_1 = px.bar(cluster_counts, y='cluster', x='count',
|
499 |
+
labels={'cluster': 'Cluster', 'count': 'Count'},
|
500 |
+
text_auto=True, # text_auto=True displays the count on top of the bars
|
501 |
+
color='cluster', # Assign different colors to each bar
|
502 |
+
color_continuous_scale='plasma', # Use the plasma color scale
|
503 |
+
category_orders={'cluster': [0, 1, 2, 3, 4]},
|
504 |
+
) # Set the order of clusters
|
505 |
+
|
506 |
+
custom_labels = {0: 'Cluster 0', 1: 'Cluster 1', 2: 'Cluster 2', 3: 'Cluster 3', 4: 'Cluster 4'}
|
507 |
+
fig_1.update_yaxes(tickvals=[0, 1, 2, 3, 4], ticktext=list(custom_labels.values()))
|
508 |
+
|
509 |
+
fig_1.update_layout(
|
510 |
+
title={'text': "Count of Data Points per Cluster", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
511 |
+
yaxis_title='Cluster', xaxis_title='Count',
|
512 |
+
xaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
513 |
+
yaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
514 |
+
title_font=dict(color='white', size=18),
|
515 |
+
# plot_bgcolor='black', # Background color
|
516 |
+
# paper_bgcolor='black', # Paper background color
|
517 |
+
title_x=0.5, # Center the title
|
518 |
+
legend=dict(
|
519 |
+
font=dict(size=16, color='white'),
|
520 |
+
bgcolor='rgba(0,0,0,0)'
|
521 |
+
))
|
522 |
+
col1.plotly_chart(fig_1,use_container_width=True)
|
523 |
+
|
524 |
+
## 2
|
525 |
+
fig_2 = px.scatter(a, x='Age', y='BMI',
|
526 |
+
color='cluster',
|
527 |
+
title="Cluster's Profile Based On Age And BMI",
|
528 |
+
color_continuous_scale='plasma') # Use the plasma color palette
|
529 |
+
|
530 |
+
fig_2.update_layout(
|
531 |
+
title={'text': "Cluster's Profile Based On Age And BMI", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
532 |
+
xaxis=dict(showgrid=False, showticklabels=False, zeroline=False),
|
533 |
+
yaxis=dict(showgrid=False, showticklabels=False, zeroline=False),
|
534 |
+
# plot_bgcolor='black', # Background color
|
535 |
+
# paper_bgcolor='black', # Paper background color
|
536 |
+
title_font=dict(color='white', size=18), # Title font color and size
|
537 |
+
margin=dict(l=20, r=20, t=40, b=20), # Set margins to make the plot more compact
|
538 |
+
legend=dict(
|
539 |
+
font=dict(size=16, color='white'),
|
540 |
+
bgcolor='rgba(0,0,0,0)'
|
541 |
+
)
|
542 |
+
)
|
543 |
+
fig_2.update_traces(marker=dict(size=12, line=dict(width=2, color='DarkSlateGrey')))
|
544 |
+
|
545 |
+
col2.plotly_chart(fig_2,use_container_width=True)
|
546 |
+
|
547 |
+
col3, col4 = st.columns(2)
|
548 |
+
## 3
|
549 |
+
palette = ['#636EFA', '#EF553B'] # Adjust the colors as needed
|
550 |
+
fig_3 = go.Figure()
|
551 |
+
for sex in a['LegalSex'].unique():
|
552 |
+
fig_3.add_trace(go.Box(
|
553 |
+
y=a[a['LegalSex'] == sex]['cluster'],
|
554 |
+
name=f'Legal Sex: {sex}',
|
555 |
+
marker_color=palette.pop(0), # Pop the first color from the palette
|
556 |
+
boxmean=True
|
557 |
+
))
|
558 |
+
fig_3.update_layout(
|
559 |
+
title={'text':"Clusters Distribution by Legal Sex", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
560 |
+
title_font=dict(color='white', size=18),
|
561 |
+
# plot_bgcolor='black',
|
562 |
+
# paper_bgcolor='black',
|
563 |
+
xaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
564 |
+
yaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
565 |
+
# plot_bgcolor='rgba(0,0,0,0)',
|
566 |
+
# paper_bgcolor='rgba(0,0,0,0)',
|
567 |
+
title_font_color='white',
|
568 |
+
showlegend=True,
|
569 |
+
legend=dict(
|
570 |
+
font=dict(size=16, color='white'),
|
571 |
+
bgcolor='rgba(0,0,0,0)'
|
572 |
+
)
|
573 |
+
)
|
574 |
+
|
575 |
+
col3.plotly_chart(fig_3,use_container_width=True)
|
576 |
+
|
577 |
+
## 4
|
578 |
+
# palette = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A'] # Example palette
|
579 |
+
fig_4 = px.violin(
|
580 |
+
a,
|
581 |
+
x="BP Severity",
|
582 |
+
y="cluster",
|
583 |
+
color="BP Severity",
|
584 |
+
color_discrete_sequence=px.colors.qualitative.Vivid,
|
585 |
+
box=True, # Adds a box plot inside the violin plot for more detail
|
586 |
+
points="all", # Shows all data points
|
587 |
+
title="Clusters Distribution by BP Severity"
|
588 |
+
)
|
589 |
+
fig_4.update_layout(
|
590 |
+
title={'text':"Clusters Distribution by BP Severity", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
591 |
+
title_font=dict(color='white', size=18),
|
592 |
+
xaxis_title="BP Severity",
|
593 |
+
yaxis_title="Cluster",
|
594 |
+
# plot_bgcolor='black',
|
595 |
+
# paper_bgcolor='black',
|
596 |
+
xaxis_title_font=dict(size=16, color='white'),
|
597 |
+
yaxis_title_font=dict(size=16, color='white'),
|
598 |
+
xaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
599 |
+
yaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
600 |
+
title_font_color='white',
|
601 |
+
legend=dict(
|
602 |
+
font=dict(size=16, color='white'),
|
603 |
+
bgcolor='rgba(0,0,0,0)'
|
604 |
+
)
|
605 |
+
)
|
606 |
+
|
607 |
+
fig_4.update_xaxes(tickangle=45) # Rotate the x-axis labels for better readability
|
608 |
+
|
609 |
+
col4.plotly_chart(fig_4,use_container_width=True)
|
610 |
+
|
611 |
+
col5, col6 = st.columns(2)
|
612 |
+
## 5
|
613 |
+
fig_5 = px.histogram(a, x="Depression Severity", color="cluster",
|
614 |
+
color_discrete_sequence=px.colors.diverging.RdYlBu,
|
615 |
+
title='Clusters Distribution by Depression Severity')
|
616 |
+
|
617 |
+
# Update layout to make it more attractive
|
618 |
+
fig_5.update_layout(
|
619 |
+
title={'text':"Clusters Distribution by Depression Severity", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
620 |
+
title_font=dict(color='white', size=18),
|
621 |
+
# plot_bgcolor='black',
|
622 |
+
# paper_bgcolor='black',
|
623 |
+
title_font_color='white',
|
624 |
+
xaxis_title='Depression Severity',
|
625 |
+
yaxis_title='Count',
|
626 |
+
xaxis_title_font_color='white',
|
627 |
+
yaxis_title_font_color='white',
|
628 |
+
legend=dict(
|
629 |
+
font=dict(size=16, color='white'),
|
630 |
+
bgcolor='rgba(0,0,0,0)'
|
631 |
+
),
|
632 |
+
xaxis=dict(
|
633 |
+
tickfont=dict(color='white', size=14),
|
634 |
+
title_font=dict(color='white', size=16),
|
635 |
+
showline=False,
|
636 |
+
showgrid=False,
|
637 |
+
ticks=''
|
638 |
+
),
|
639 |
+
yaxis=dict(
|
640 |
+
tickfont=dict(color='white', size=14),
|
641 |
+
title_font=dict(color='white', size=16),
|
642 |
+
showline=False,
|
643 |
+
showgrid=False,
|
644 |
+
ticks=''
|
645 |
+
),
|
646 |
+
coloraxis_colorbar=dict(
|
647 |
+
tickfont=dict(color='white')
|
648 |
+
)
|
649 |
+
)
|
650 |
+
|
651 |
+
# Show the plot
|
652 |
+
col5.plotly_chart(fig_5,use_container_width=True)
|
653 |
+
|
654 |
+
## 6
|
655 |
+
fig_6 = px.violin(a, y="cluster", x="Temp_condition", box=True, points="all",
|
656 |
+
color="Temp_condition", color_discrete_sequence=px.colors.diverging.RdYlBu,
|
657 |
+
title='Clusters Distribution by Temp_condition')
|
658 |
+
|
659 |
+
# Update layout to make it more attractive
|
660 |
+
fig_6.update_layout(
|
661 |
+
title={'text':"Clusters Distribution by Temp_condition", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
662 |
+
title_font=dict(color='white', size=18),
|
663 |
+
# plot_bgcolor='black',
|
664 |
+
# paper_bgcolor='black',
|
665 |
+
title_font_color='white',
|
666 |
+
xaxis_title='Temp_condition',
|
667 |
+
yaxis_title='Clusters',
|
668 |
+
xaxis_title_font_color='white',
|
669 |
+
yaxis_title_font_color='white',
|
670 |
+
legend=dict(
|
671 |
+
font=dict(size=16, color='white'),
|
672 |
+
bgcolor='rgba(0,0,0,0)'
|
673 |
+
),
|
674 |
+
xaxis=dict(
|
675 |
+
tickfont=dict(color='white', size=14),
|
676 |
+
title_font=dict(color='white', size=16),
|
677 |
+
showline=False,
|
678 |
+
showgrid=False,
|
679 |
+
ticks=''
|
680 |
+
),
|
681 |
+
yaxis=dict(
|
682 |
+
tickfont=dict(color='white', size=14),
|
683 |
+
title_font=dict(color='white', size=16),
|
684 |
+
showline=False,
|
685 |
+
showgrid=False,
|
686 |
+
ticks=''
|
687 |
+
),
|
688 |
+
coloraxis_colorbar=dict(
|
689 |
+
tickfont=dict(color='white')
|
690 |
+
)
|
691 |
+
)
|
692 |
+
|
693 |
+
# Show the plot
|
694 |
+
col6.plotly_chart(fig_6,use_container_width=True)
|
695 |
+
|
696 |
+
col7, col8 = st.columns(2)
|
697 |
+
|
698 |
+
##7
|
699 |
+
# Create the stacked bar chart
|
700 |
+
ad = a.groupby(['weight_condition', 'cluster']).size().reset_index(name='count')
|
701 |
+
|
702 |
+
fig_7 = px.bar(ad,
|
703 |
+
x='weight_condition',
|
704 |
+
y='count',
|
705 |
+
color='cluster',
|
706 |
+
title='Clusters Distribution by Weight Condition',
|
707 |
+
text='count',
|
708 |
+
barmode='stack',
|
709 |
+
color_discrete_sequence=px.colors.diverging.RdYlBu) # Use a color scale or palette of your choice
|
710 |
+
|
711 |
+
# Update layout to make it more attractive and remove axes elements
|
712 |
+
fig_7.update_layout(
|
713 |
+
title={'text': 'Clusters Distribution by Weight Condition', 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
714 |
+
title_font=dict(color='white', size=18),
|
715 |
+
xaxis=dict(
|
716 |
+
title='', # Remove x-axis title
|
717 |
+
showline=False,
|
718 |
+
showgrid=False,
|
719 |
+
zeroline=False,
|
720 |
+
tickfont=dict(size=14, color='white'),
|
721 |
+
tickangle=45 # Rotate x-axis labels for better readability
|
722 |
+
),
|
723 |
+
yaxis=dict(
|
724 |
+
title='', # Remove y-axis title
|
725 |
+
showline=False,
|
726 |
+
showgrid=False,
|
727 |
+
zeroline=False,
|
728 |
+
tickfont=dict(size=14, color='white')
|
729 |
+
),
|
730 |
+
# plot_bgcolor='black', # Background color
|
731 |
+
# paper_bgcolor='black', # Paper background color
|
732 |
+
margin=dict(l=20, r=20, t=40, b=20), # Set margins to make the plot more compact
|
733 |
+
legend=dict(
|
734 |
+
font=dict(size=16, color='white'),
|
735 |
+
bgcolor='rgba(0,0,0,0)'
|
736 |
+
)
|
737 |
+
)
|
738 |
+
|
739 |
+
# Update bar text style
|
740 |
+
fig_7.update_traces(texttemplate='%{text:.2s}', textfont_size=14, textposition='inside', marker=dict(line=dict(width=1, color='DarkSlateGrey')))
|
741 |
+
|
742 |
+
# Show the plot
|
743 |
+
col7.plotly_chart(fig_7,use_container_width=True)
|
744 |
+
|
745 |
+
|
746 |
+
## 8
|
747 |
+
fig_8 = px.box(a,
|
748 |
+
x='SPO2_condition',
|
749 |
+
y='Age',
|
750 |
+
points='all', # Show all points
|
751 |
+
title="Clusters Distribution by SPO2_condition",
|
752 |
+
color='cluster',
|
753 |
+
color_discrete_sequence=px.colors.sequential.Plasma_r)
|
754 |
+
|
755 |
+
# Update layout to remove axes titles, labels, and gridlines, and style the chart
|
756 |
+
fig_8.update_layout(
|
757 |
+
title={'text': "Clusters Distribution by SPO2_condition", 'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top'},
|
758 |
+
title_font=dict(color='white', size=18),
|
759 |
+
xaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
760 |
+
yaxis=dict(showline=False, showgrid=False, zeroline=False, tickfont=dict(size=14, color='white')),
|
761 |
+
# plot_bgcolor='black', # Background color
|
762 |
+
# paper_bgcolor='black', # Paper background color
|
763 |
+
margin=dict(l=20, r=20, t=40, b=20), # Set margins to make the plot more compact
|
764 |
+
legend=dict(
|
765 |
+
font=dict(size=16, color='white'),
|
766 |
+
bgcolor='rgba(0,0,0,0)'
|
767 |
+
)
|
768 |
+
)
|
769 |
+
|
770 |
+
# Customize the boxen plot appearance
|
771 |
+
fig_8.update_traces(
|
772 |
+
boxmean=True, # Add mean line
|
773 |
+
jitter=0.3, # Spread points along x-axis
|
774 |
+
marker=dict(size=10, line=dict(width=2, color='DarkSlateGrey'))
|
775 |
+
)
|
776 |
+
|
777 |
+
# Show the plot
|
778 |
+
col8.plotly_chart(fig_8,use_container_width=True)
|
779 |
+
|
780 |
+
col_11 = st.columns(1)[0]
|
781 |
+
fig_11 = px.scatter_matrix(
|
782 |
+
a[['Age', 'SystolicBP', 'Pulse', 'Weight', 'BMI', 'cluster']],
|
783 |
+
dimensions=['Age', 'SystolicBP', 'Pulse', 'Weight', 'BMI'],
|
784 |
+
color='cluster',
|
785 |
+
title="Scatter Matrix of Selected Features by Cluster",
|
786 |
+
labels={col: col for col in ['Age', 'SystolicBP', 'Pulse', 'Weight', 'BMI']},
|
787 |
+
color_continuous_scale= px.colors.diverging.Spectral
|
788 |
+
)
|
789 |
+
|
790 |
+
# Update layout for better visualization
|
791 |
+
fig_11.update_traces(diagonal_visible=True)
|
792 |
+
fig_11.update_layout(height=700, width=700, showlegend=True)
|
793 |
+
|
794 |
+
# Show the plot
|
795 |
+
col_11.plotly_chart(fig_11,use_container_width=True)
|
796 |
+
#
|
797 |
+
|
798 |
+
##### Joint Plot
|
799 |
+
st.subheader("Summary")
|
800 |
+
meanvalue_columns = [col for col in list(a.columns) if 'meanvalue' in col]
|
801 |
+
# Group data by clusters
|
802 |
+
grouped_data = a.groupby('cluster')
|
803 |
+
|
804 |
+
# Calculate mean for numerical columns
|
805 |
+
numerical_columns = a.select_dtypes(include=['number']).columns
|
806 |
+
numerical_summary = grouped_data[numerical_columns].mean()
|
807 |
+
|
808 |
+
# Calculate mode for categorical columns
|
809 |
+
categorical_columns = a.select_dtypes(include=['object', 'category','string']).columns
|
810 |
+
categorical_summary = grouped_data[categorical_columns].agg(lambda x: x.value_counts().index[0])
|
811 |
+
|
812 |
+
for i in range(len(a['cluster'].value_counts())):
|
813 |
+
# Example for Cluster 0
|
814 |
+
cluster_traits = {
|
815 |
+
"Age": numerical_summary.loc[i, 'Age'],
|
816 |
+
"Age_Category": categorical_summary.loc[i,"Age_Category"],
|
817 |
+
"SystolicBP": numerical_summary.loc[i, 'SystolicBP'],
|
818 |
+
"Depression Severity": categorical_summary.loc[i, 'Depression Severity'],
|
819 |
+
"Weight Condition" : categorical_summary.loc[i, 'weight_condition'],
|
820 |
+
"BP Severity" : categorical_summary.loc[i, 'BP Severity'],
|
821 |
+
"Pulse_condition" : categorical_summary.loc[i, 'Pulse_condition'],
|
822 |
+
"Respiration_condition" : categorical_summary.loc[i, 'Respiration_condition'],
|
823 |
+
"SPO2_condition" : categorical_summary.loc[i, 'SPO2_condition'],
|
824 |
+
|
825 |
+
}
|
826 |
+
|
827 |
+
# if numerical_summary.loc[i, 'GLUCOSE_meanvalue'] > 100:
|
828 |
+
# glucose_condition = "High frequency of patients with slightly elevated glucose levels."
|
829 |
+
# else:
|
830 |
+
# glucose_condition = "Normal glucose levels."
|
831 |
+
|
832 |
+
|
833 |
+
|
834 |
+
# Writing the summary
|
835 |
+
summary = f"""
|
836 |
+
Cluster - {i} Traits
|
837 |
+
1. Age: Average age is {round(cluster_traits['Age'])} years.
|
838 |
+
2. SystolicBP: Patients tend to have slightly elevated systolic blood pressure, averaging {cluster_traits['SystolicBP']} mmHg.
|
839 |
+
3. Depression Severity: Predominantly '{cluster_traits['Depression Severity']}'.
|
840 |
+
4. "Weight Condition" : {cluster_traits['Weight Condition']}.
|
841 |
+
5. "Respiration_condition" : {cluster_traits['Respiration_condition']}.
|
842 |
+
6. "Pulse_condition" : {cluster_traits['Pulse_condition']}.
|
843 |
+
7. "SPO2_condition" : {cluster_traits['SPO2_condition']}.
|
844 |
+
|
845 |
+
Trait Summary: Cluster {i} mainly consists of {cluster_traits['Age_Category']} individuals with {cluster_traits['Depression Severity']} depression level, {cluster_traits['BP Severity'].lower()}.
|
846 |
+
"""
|
847 |
+
|
848 |
+
st.write(summary)
|
849 |
+
st.write(round(numerical_summary[meanvalue_columns],2))
|
850 |
+
|
851 |
+
st.subheader("Density Contour Plot")
|
852 |
+
with st.container():
|
853 |
+
# Loop through the columns and create plots
|
854 |
+
for i in meanvalue_columns:
|
855 |
+
fig = px.density_contour(
|
856 |
+
a, # Replace 'a' with your actual DataFrame name
|
857 |
+
y="Age",
|
858 |
+
x=i,
|
859 |
+
color="cluster",
|
860 |
+
marginal_x="histogram",
|
861 |
+
marginal_y="histogram",
|
862 |
+
template="simple_white",
|
863 |
+
color_discrete_sequence=px.colors.qualitative.Set1
|
864 |
+
)
|
865 |
+
|
866 |
+
# Add fill to the contours for a similar effect to kde
|
867 |
+
fig.update_traces(bingroup="fill")
|
868 |
+
|
869 |
+
# Update layout for better aesthetics
|
870 |
+
fig.update_layout(
|
871 |
+
title=f"Joint Density Contour of {i} vs Age by Clusters",
|
872 |
+
yaxis_title="Age",
|
873 |
+
xaxis_title=i,
|
874 |
+
xaxis=dict(
|
875 |
+
title=i,
|
876 |
+
showline=False,
|
877 |
+
showgrid=False,
|
878 |
+
zeroline=False,
|
879 |
+
tickfont=dict(size=14, color='white'),
|
880 |
+
tickangle=45, # Rotate x-axis labels for better readability
|
881 |
+
titlefont=dict(size=16, color='white') # Set x-axis title to white
|
882 |
+
),
|
883 |
+
yaxis=dict(
|
884 |
+
title='Age',
|
885 |
+
showline=False,
|
886 |
+
showgrid=False,
|
887 |
+
zeroline=False,
|
888 |
+
tickfont=dict(size=14, color='white'),
|
889 |
+
titlefont=dict(size=16, color='white') # Set y-axis title to white
|
890 |
+
),
|
891 |
+
plot_bgcolor='black',
|
892 |
+
paper_bgcolor='black',
|
893 |
+
title_font_color='white',
|
894 |
+
legend_title="Clusters",
|
895 |
+
width=1500, # Adjust width as needed
|
896 |
+
height=800 # Increase height to make the plot taller
|
897 |
+
)
|
898 |
+
|
899 |
+
# Display the plot using st.plotly_chart within a column
|
900 |
+
st.plotly_chart(fig, use_container_width=True)
|
901 |
+
|
902 |
+
|
903 |
+
def ML(filtered_data, scaler, unscaled_data):
|
904 |
+
man = filtered_data.copy()
|
905 |
+
man=man.dropna()
|
906 |
+
|
907 |
+
man.drop(columns=['PatientID','VisitID'],inplace=True)
|
908 |
+
numerical_columns = list(man.select_dtypes(include=['int', 'float']).columns)
|
909 |
+
categorial_columns = list(man.select_dtypes(exclude=['int', 'float','datetime']).columns)
|
910 |
+
categorical_indexes = []
|
911 |
+
|
912 |
+
for c in categorial_columns:
|
913 |
+
categorical_indexes.append(man.columns.get_loc(c))
|
914 |
+
|
915 |
+
t = man.shape
|
916 |
+
# st.write(t)
|
917 |
+
if 5 < t[0] < 10:
|
918 |
+
ki = 3
|
919 |
+
elif t[0] <= 4 :
|
920 |
+
ki = 1
|
921 |
+
else:
|
922 |
+
ki = 4
|
923 |
+
kproto = KPrototypes(n_clusters= ki, init='Huang', n_init = 25, random_state=42)
|
924 |
+
kproto.fit_predict(man, categorical= categorical_indexes)
|
925 |
+
cluster_labels = kproto.labels_
|
926 |
+
|
927 |
+
original_numeric_data = scaler.inverse_transform(man[numerical_columns])
|
928 |
+
|
929 |
+
# Convert back to DataFrame and add cluster labels
|
930 |
+
original_data = pd.DataFrame(original_numeric_data, columns=numerical_columns)
|
931 |
+
original_data["cluster"] = cluster_labels
|
932 |
+
original_data["cluster"] = original_data["cluster"].astype('category')
|
933 |
+
|
934 |
+
## PCA Graph
|
935 |
+
pca = PCA(n_components=4)
|
936 |
+
pca_df = pca.fit_transform(original_data[numerical_columns])
|
937 |
+
d = list(original_data[numerical_columns].columns)
|
938 |
+
pca_df = pd.DataFrame(pca_df, columns=d[:4])
|
939 |
+
|
940 |
+
import plotly.graph_objects as go
|
941 |
+
|
942 |
+
st.subheader("PCA")
|
943 |
+
fig_9 = go.Figure(
|
944 |
+
go.Scatter3d(mode='markers',
|
945 |
+
x = pca_df.iloc[:, 0],
|
946 |
+
y = pca_df.iloc[:, 1],
|
947 |
+
z = pca_df.iloc[:, 2],
|
948 |
+
marker=dict(size = 4, color = original_data['cluster'], colorscale = 'spectral')
|
949 |
+
)
|
950 |
+
)
|
951 |
+
|
952 |
+
fig_9.update_layout(
|
953 |
+
scene=dict(
|
954 |
+
xaxis_title=d[0],
|
955 |
+
yaxis_title=d[1],
|
956 |
+
zaxis_title=d[2],
|
957 |
+
# bgcolor='black', # Background color inside the 3D plot
|
958 |
+
xaxis=dict(color='white'), # Axis label color
|
959 |
+
yaxis=dict(color='white'),
|
960 |
+
zaxis=dict(color='white')
|
961 |
+
),
|
962 |
+
# plot_bgcolor='black', # Background color outside the 3D plot
|
963 |
+
# paper_bgcolor='black' # Paper (entire plot area) background color
|
964 |
+
)
|
965 |
+
col9 = st.columns(1)[0]
|
966 |
+
col9.plotly_chart(fig_9, use_container_width=True)
|
967 |
+
|
968 |
+
|
969 |
+
|
970 |
+
|
971 |
+
mann = man[categorial_columns].copy()
|
972 |
+
orig = original_data.reset_index(drop=True)
|
973 |
+
mann = mann.reset_index(drop=True)
|
974 |
+
original_data = pd.concat([orig, mann], axis=1)
|
975 |
+
|
976 |
+
return plots(original_data)
|
977 |
+
|
978 |
+
|
979 |
+
|
980 |
+
def imputer(filtered_data):
|
981 |
+
numeric_columns = filtered_data.select_dtypes(include=['int', 'float'])
|
982 |
+
numeric_columns = numeric_columns.iloc[:,2:].copy()
|
983 |
+
|
984 |
+
# Setting the random_state argument for reproducibility
|
985 |
+
imputer = IterativeImputer(random_state=42)
|
986 |
+
imputed = imputer.fit_transform(numeric_columns)
|
987 |
+
Imputed_data = pd.DataFrame(imputed, columns=numeric_columns.columns)
|
988 |
+
Imputed_data = round(Imputed_data, 2)
|
989 |
+
columns_drop = Imputed_data.columns
|
990 |
+
filtered_data = filtered_data.drop(columns=columns_drop)
|
991 |
+
Ml_data = pd.concat([filtered_data, Imputed_data], axis=1)
|
992 |
+
unscaled_data = Ml_data.copy()
|
993 |
+
|
994 |
+
##Scaling
|
995 |
+
scaled_data = Ml_data.select_dtypes(include=['int', 'float'])
|
996 |
+
scaled_data = scaled_data.iloc[:,2:].copy()
|
997 |
+
scaler = StandardScaler()
|
998 |
+
scaler.fit(scaled_data)
|
999 |
+
scaled_data = pd.DataFrame(scaler.transform(scaled_data),columns= scaled_data.columns)
|
1000 |
+
columns_drop = scaled_data.columns
|
1001 |
+
Ml_data = Ml_data.drop(columns=columns_drop)
|
1002 |
+
Ml_data = pd.concat([Ml_data, scaled_data], axis=1)
|
1003 |
+
Ml_data = Ml_data.convert_dtypes() # change this to outlier_removed if you want outliwer to be removed
|
1004 |
+
return ML(Ml_data, scaler, unscaled_data)
|
1005 |
+
|
1006 |
+
|
1007 |
+
<<<<<<< HEAD
|
1008 |
+
@st.cache_data()
|
1009 |
+
def fetch_data_1():
|
1010 |
+
data = pd.read_parquet("ML_DATA.parquet")
|
1011 |
+
return data
|
1012 |
+
|
1013 |
+
|
1014 |
+
=======
|
1015 |
+
|
1016 |
+
filename_1 = "ML_DATA.parquet"
|
1017 |
+
|
1018 |
+
# Access the token
|
1019 |
+
token = os.environ["HUGGING_FACE_HUB_TOKEN"]
|
1020 |
+
|
1021 |
+
# Download the file
|
1022 |
+
local_file_1 = hf_hub_download(repo_id=repo_id, filename=filename_1,repo_type="dataset", token=token)
|
1023 |
+
|
1024 |
+
@st.cache_data()
|
1025 |
+
def fetch_data_1():
|
1026 |
+
data = pd.read_parquet(local_file_1)
|
1027 |
+
return data
|
1028 |
+
|
1029 |
+
|
1030 |
+
|
1031 |
+
|
1032 |
+
>>>>>>> a1737e215eee9b3b19a3c8b876c3d8053c8ba3ec
|
1033 |
+
if analysis_option == 'Machine Learning':
|
1034 |
+
data = fetch_data_1()
|
1035 |
+
problem = list(data['Description'].unique())
|
1036 |
+
st.subheader("_Select Disease_:sunglasses:")
|
1037 |
+
health_option = st.selectbox("_Select Disease_:sunglasses:",['', *problem], label_visibility="collapsed")
|
1038 |
+
filtered_data = data[data['Description'] == health_option].copy()
|
1039 |
+
if filtered_data['key_lab2'].notna().any():
|
1040 |
+
column_list = ['PatientID', 'VisitID', 'GroupedICD'] + list(filtered_data['key_lab2'].iloc[0])
|
1041 |
+
pivot_data = pd.pivot_table(filtered_data, values='ComponentValue', index=['PatientID', 'VisitID', 'GroupedICD'], columns='ComponentName', aggfunc=lambda x: ', '.join(map(str, x)))
|
1042 |
+
pivot_data = pivot_data.reset_index(drop=False)
|
1043 |
+
pivot_data = pivot_data[column_list].copy()
|
1044 |
+
filtered_data = pd.merge(filtered_data, pivot_data, on=['PatientID', 'VisitID','GroupedICD'], how='left')
|
1045 |
+
|
1046 |
+
filtered_data.iloc[:, -20:] = filtered_data.iloc[:, -20:].convert_dtypes()
|
1047 |
+
hmm = pd.DataFrame()
|
1048 |
+
# num_columns = 20
|
1049 |
+
num_columns = len(list(filtered_data['key_lab2'].iloc[0]))
|
1050 |
+
for i in range(1, num_columns+1):
|
1051 |
+
existing_column = filtered_data.columns[-i]
|
1052 |
+
new_column_name = f'{existing_column}_meanvalue'
|
1053 |
+
hmm[new_column_name] = filtered_data[existing_column].apply(mean_of_values)
|
1054 |
+
filtered_data = pd.concat([filtered_data, hmm], axis=1)
|
1055 |
+
column_list = [
|
1056 |
+
## Necessary columns
|
1057 |
+
'PatientID', 'VisitID', 'GroupedICD',
|
1058 |
+
|
1059 |
+
## Numerical values
|
1060 |
+
'Age', 'SystolicBP',
|
1061 |
+
'DiastolicBP','Temperature',
|
1062 |
+
'Pulse', 'Weight', 'Height', 'BMI', 'Respiration',
|
1063 |
+
'SPO2', 'PHQ_9Score',
|
1064 |
+
# 'PeakFlow'
|
1065 |
+
|
1066 |
+
## Categorial Values
|
1067 |
+
'LegalSex','BPLocation', 'BPPosition', 'PregnancyStatus', 'LactationStatus', 'TemperatureSource',
|
1068 |
+
'Age_Category','BP Severity','Depression Severity','weight_condition', 'Temp_condition', 'Pulse_condition',
|
1069 |
+
'Respiration_condition', 'SPO2_condition', 'PeakF_condition']
|
1070 |
+
# last = list(filtered_data.columns[-20:])
|
1071 |
+
last = list(hmm.columns)
|
1072 |
+
required_columns = column_list + last
|
1073 |
+
filtered_data = filtered_data[required_columns].copy()
|
1074 |
+
filtered_data = filtered_data.drop_duplicates().reset_index(drop=True)
|
1075 |
+
filtered_data = filtered_data.dropna(axis=1, how='all')
|
1076 |
+
imputer(filtered_data)
|
1077 |
+
|
1078 |
+
|
1079 |
+
|
1080 |
+
|
1081 |
+
|
1082 |
+
|
1083 |
+
|
1084 |
+
|
1085 |
+
|
1086 |
+
|
1087 |
+
|
1088 |
+
|
1089 |
+
|
1090 |
+
|
1091 |
+
|
1092 |
+
|
1093 |
+
|
1094 |
+
|
1095 |
+
|
1096 |
+
|
1097 |
+
|
1098 |
+
|
1099 |
+
|
1100 |
+
|
1101 |
+
|
1102 |
+
|
1103 |
+
|
1104 |
+
|
1105 |
+
|
1106 |
+
|
1107 |
+
|
1108 |
+
|
1109 |
+
|
1110 |
+
|
1111 |
+
|
1112 |
+
|
1113 |
+
|
1114 |
+
|
1115 |
+
|
1116 |
+
|
1117 |
+
|
1118 |
+
|
1119 |
+
|
1120 |
+
|
1121 |
+
|
1122 |
+
|
1123 |
+
|
1124 |
+
|
1125 |
+
|
1126 |
+
|
1127 |
+
|
1128 |
+
|
1129 |
+
|
1130 |
+
|
1131 |
+
|
1132 |
+
|
1133 |
+
|
1134 |
+
|
1135 |
+
|
1136 |
+
|
1137 |
+
|
1138 |
+
|
1139 |
+
|
1140 |
+
if analysis_option == 'Data':
|
1141 |
+
age_min = int(data['Age'].min())
|
1142 |
+
age_max = int(data['Age'].max())
|
1143 |
+
age_range = st.sidebar.slider('Select Age Range', age_min, age_max, (age_min, age_max))
|
1144 |
+
data = data[(data['Age'] >= age_range[0]) & (data['Age'] <= age_range[1])].copy()
|
1145 |
+
|
1146 |
+
Sex = data.groupby('LegalSex')['PatientID'].nunique().reset_index(name='count')
|
1147 |
+
st.subheader("Distribution of Patient's by Sex", divider='rainbow')
|
1148 |
+
col1, col2,col3 = st.columns(3)
|
1149 |
+
col1.metric(label="Male", value = Sex[Sex['LegalSex']=='Male']['count'][1])
|
1150 |
+
col2.metric(label="Female", value = Sex[Sex['LegalSex']=='Female']['count'][0])
|
1151 |
+
col4, col5 = st.columns(2)
|
1152 |
+
fig2 = funnel_chart(data)
|
1153 |
+
col4.plotly_chart(fig2, use_container_width=True)
|
1154 |
+
fig = scatterplot(data)
|
1155 |
+
col5.plotly_chart(fig, use_container_width=True)
|
1156 |
+
col6 = st.columns(1)[0]
|
1157 |
+
fig_man = scatter_man(data)
|
1158 |
+
col6.plotly_chart(fig_man, use_container_width=True)
|
1159 |
+
|
1160 |
+
st.dataframe(data.head(20).style.format({'PatientID': "{:.0f}"}))
|
1161 |
+
|
1162 |
+
if analysis_option == 'EDA':
|
1163 |
+
age_min = int(data['Age'].min())
|
1164 |
+
age_max = int(data['Age'].max())
|
1165 |
+
age_range = st.sidebar.slider('Select Age Range', age_min, age_max, (age_min, age_max))
|
1166 |
+
data = data[(data['Age'] >= age_range[0]) & (data['Age'] <= age_range[1])].copy()
|
1167 |
+
|
1168 |
+
problem = list(data['Description'].unique())
|
1169 |
+
st.subheader("_Select Disease_:sunglasses:")
|
1170 |
+
health_option = st.selectbox("_Select Disease_:sunglasses:",['', *problem], label_visibility="collapsed")
|
1171 |
+
if health_option in problem:
|
1172 |
+
health_data = data[data['Description'] == health_option].copy()
|
1173 |
+
Sex = health_data.groupby('LegalSex')['PatientID'].nunique().reset_index(name='count')
|
1174 |
+
st.subheader(f"Patients for '{health_option}' by Sex", divider='rainbow')
|
1175 |
+
col1, col2, col3 = st.columns(3)
|
1176 |
+
if 'Male' in Sex['LegalSex'].values:
|
1177 |
+
col1.metric(label="Male", value=Sex[Sex['LegalSex'] == 'Male']['count'].iloc[0])
|
1178 |
+
else:
|
1179 |
+
col1.metric(label="Male", value=0)
|
1180 |
+
if 'Female' in Sex['LegalSex'].values:
|
1181 |
+
col2.metric(label="Female", value=Sex[Sex['LegalSex'] == 'Female']['count'].iloc[0])
|
1182 |
+
else:
|
1183 |
+
col2.metric(label="Male", value=0)
|
1184 |
+
col4, col5 = st.columns(2)
|
1185 |
+
fig2 = funnel_chart(health_data)
|
1186 |
+
col4.plotly_chart(fig2, use_container_width=True)
|
1187 |
+
|
1188 |
+
fig3 = barplot_lab(health_data)
|
1189 |
+
col5.plotly_chart(fig3, use_container_width=True)
|
1190 |
+
|
1191 |
+
col6, col7 = st.columns(2)
|
1192 |
+
fig4 = histplot_6(health_data)
|
1193 |
+
col6.plotly_chart(fig4, use_container_width=True)
|
1194 |
+
|
1195 |
+
fig5 = histplot_7(health_data)
|
1196 |
+
col7.plotly_chart(fig5, use_container_width=True)
|
1197 |
+
|
1198 |
+
col8, col9 = st.columns(2)
|
1199 |
+
fig6 = pie_chart_7(health_data)
|
1200 |
+
col8.plotly_chart(fig6, use_container_width=True)
|
1201 |
+
|
1202 |
+
fig7 = chart_8(health_data)
|
1203 |
+
col9.plotly_chart(fig7, use_container_width=True)
|
1204 |
+
|
1205 |
+
|
1206 |
+
col10, col11 = st.columns(2)
|
1207 |
+
fig8 = chart_9(health_data)
|
1208 |
+
col10.plotly_chart(fig8, use_container_width=True)
|
1209 |
+
|
1210 |
+
fig9 = chart_10(health_data)
|
1211 |
+
col11.plotly_chart(fig9, use_container_width=True)
|
1212 |
+
|
1213 |
+
col12, col13 = st.columns(2)
|
1214 |
+
fig10 = chart_11(health_data)
|
1215 |
+
col12.plotly_chart(fig10, use_container_width=True)
|
1216 |
+
|
1217 |
+
st.dataframe(health_data.head(20).style.format({'PatientID': "{:.0f}"}))
|
1218 |
+
|
1219 |
+
|
1220 |
+
if analysis_option == 'Health Care Chat Bot AI':
|
1221 |
+
##//////start here just add paitnet + vital information.
|
1222 |
+
# data = pd.read_parquet('Health-Data-3.parquet')
|
1223 |
+
google_key = st.secrets["api_keys"]["google_key"]
|
1224 |
+
llm = GoogleGemini(api_key=google_key)
|
1225 |
+
pandas_ai = SmartDataframe(data, config={"llm": llm, "response_parser": StreamlitResponse,"verbose": True})
|
1226 |
+
pandas_ai_2 = SmartDataframe(data, config={"llm": llm,"verbose": True}) ## string
|
1227 |
+
# Streamlit app title and description
|
1228 |
+
st.title("AI-Powered Data Analysis App")
|
1229 |
+
st.write("This application allows you to interact with your dataset using natural language prompts. Just ask a question, and the AI will provide insights based on your data.")
|
1230 |
+
|
1231 |
+
# Display the dataset
|
1232 |
+
st.subheader("Dataset Preview")
|
1233 |
+
st.dataframe(data.head())
|
1234 |
+
|
1235 |
+
# User input for natural language prompt
|
1236 |
+
prompt = st.text_input("Enter your prompt:", placeholder="e.g., What are the top diagnoses?")
|
1237 |
+
|
1238 |
+
# Process the input and display the result
|
1239 |
+
if st.button("Submit"):
|
1240 |
+
if 'plot' in prompt or 'graph' in prompt or 'PLOT' in prompt or 'Graph' in prompt:
|
1241 |
+
try:
|
1242 |
+
result = pandas_ai.chat(prompt)
|
1243 |
+
st.subheader("Result")
|
1244 |
+
except KeyError as e:
|
1245 |
+
st.error(f"Error: {e}. Unable to retrieve result.")
|
1246 |
+
elif prompt:
|
1247 |
+
try:
|
1248 |
+
result = pandas_ai_2.chat(prompt)
|
1249 |
+
st.subheader("Result")
|
1250 |
+
st.write(result)
|
1251 |
+
except KeyError as e:
|
1252 |
+
st.error(f"Error: {e}. Unable to retrieve result.")
|
1253 |
+
else:
|
1254 |
+
st.warning("Please enter a prompt.")
|
1255 |
+
|
1256 |
+
# Add a footer
|
1257 |
+
st.write("Powered by PandasAI and Google Gemini.")
|
1258 |
+
|
1259 |
+
|
1260 |
+
|
1261 |
+
|
1262 |
+
|
1263 |
+
|
1264 |
+
|
1265 |
+
|
1266 |
+
|
1267 |
+
|
1268 |
+
|
1269 |
+
|
1270 |
+
|
1271 |
+
|
1272 |
+
|
1273 |
+
|
1274 |
+
|
1275 |
+
|
1276 |
+
|
1277 |
+
|
1278 |
+
|
1279 |
+
|
1280 |
+
|
1281 |
+
|
1282 |
+
|
1283 |
+
|
1284 |
+
|
1285 |
+
|
1286 |
+
|
1287 |
+
|
1288 |
+
|
1289 |
+
|
requirements.txt
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pip
|
2 |
+
kmodes
|
3 |
+
matplotlib==3.8.4
|
4 |
+
numpy==1.26.4
|
5 |
+
pandas
|
6 |
+
pandasai==2.2.14
|
7 |
+
plotly==5.22.0
|
8 |
+
scikit_learn==1.4.2
|
9 |
+
streamlit
|
10 |
+
wordcloud==1.9.3
|
11 |
+
google-generativeai
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
|