{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.14","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":9686989,"sourceType":"datasetVersion","datasetId":5921838}],"dockerImageVersionId":30787,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Active Graph Networks for Healthcare Risk Assessment and Patient Insights\n\nIn this notebook, we explore the use of Active Graph Networks (AGNs) to analyze healthcare data, revealing relationships among patient attributes, risk factors, and health outcomes. AGNs allow us to incorporate clinical insights through structured relationships and contextual analysis.\n\nWe will:\n- Load and understand the dataset.\n- Define relationships within an AGN and visualize its structure.\n- Calculate risk scores based on interactions in the AGN.\n- Apply feature importance and predictive modeling.\n- Segment patients using clustering techniques.\n- Conduct statistical tests to validate findings.\n\nLet's dive in!","metadata":{"_uuid":"4333c724-0a54-4ad1-90a7-05a62c033863","_cell_guid":"8e0493e7-5240-403b-b6a2-c27ae6945f9f","trusted":true,"collapsed":false,"jupyter":{"outputs_hidden":false}}},{"cell_type":"code","source":"!pip install node2vec","metadata":{"trusted":true,"jupyter":{"source_hidden":true}},"outputs":[],"execution_count":null},{"cell_type":"code","source":"# Step 1: Import Libraries\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport networkx as nx\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.preprocessing import StandardScaler, LabelEncoder\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import classification_report\nfrom sklearn.cluster import KMeans\nfrom node2vec import Node2Vec","metadata":{"_uuid":"7fdc921a-9bc0-41a9-aab2-4fec1f140dbb","_cell_guid":"b17b876c-305f-4002-b1a2-5accd42f018b","trusted":true,"execution":{"iopub.status.busy":"2024-11-02T23:13:56.624502Z","iopub.execute_input":"2024-11-02T23:13:56.625017Z","iopub.status.idle":"2024-11-02T23:13:56.632568Z","shell.execute_reply.started":"2024-11-02T23:13:56.624966Z","shell.execute_reply":"2024-11-02T23:13:56.631085Z"},"jupyter":{"source_hidden":true}},"outputs":[],"execution_count":22},{"cell_type":"markdown","source":"## 1. Data Exploration and Initial Analysis","metadata":{"_uuid":"f203e0cd-411d-421b-b140-bff718c502bf","_cell_guid":"16c0721d-bfbc-45fa-979d-ccd61c0d91e2","trusted":true,"collapsed":false,"jupyter":{"outputs_hidden":false}}},{"cell_type":"code","source":"# Load the dataset\ndata = pd.read_excel('/kaggle/input/patients-data-for-medical-field/Patients Data ( Used for Heart Disease Prediction ).xlsx')\n\n# Display initial rows and column information\ndisplay(data.head())\nprint(\"\\nColumn Types and Summary Stats:\")\nprint(data.dtypes)\ndisplay(data.describe())","metadata":{"_uuid":"3c2baeed-9f31-44c6-861c-9369a7377a31","_cell_guid":"45016200-741d-47ee-9072-3b6b541d341a","trusted":true,"execution":{"iopub.status.busy":"2024-11-02T23:13:59.390242Z","iopub.execute_input":"2024-11-02T23:13:59.390720Z","iopub.status.idle":"2024-11-02T23:17:04.549206Z","shell.execute_reply.started":"2024-11-02T23:13:59.390675Z","shell.execute_reply":"2024-11-02T23:17:04.547897Z"},"jupyter":{"source_hidden":true}},"outputs":[{"output_type":"display_data","data":{"text/plain":" PatientID State Sex GeneralHealth AgeCategory HeightInMeters \\\n0 1 Alabama Female Fair Age 75 to 79 1.63 \n1 2 Alabama Female Very good Age 65 to 69 1.60 \n2 3 Alabama Male Excellent Age 60 to 64 1.78 \n3 4 Alabama Male Very good Age 70 to 74 1.78 \n4 5 Alabama Female Good Age 50 to 54 1.68 \n\n WeightInKilograms BMI HadHeartAttack HadAngina ... \\\n0 84.820000 32.099998 0 1 ... \n1 71.669998 27.990000 0 0 ... \n2 71.209999 22.530001 0 0 ... \n3 95.250000 30.129999 0 0 ... \n4 78.019997 27.760000 0 0 ... \n\n ECigaretteUsage ChestScan \\\n0 Never used e-cigarettes in my entire life 1 \n1 Never used e-cigarettes in my entire life 0 \n2 Never used e-cigarettes in my entire life 0 \n3 Never used e-cigarettes in my entire life 0 \n4 Never used e-cigarettes in my entire life 1 \n\n RaceEthnicityCategory AlcoholDrinkers HIVTesting FluVaxLast12 \\\n0 White only, Non-Hispanic 0 0 0 \n1 White only, Non-Hispanic 0 0 1 \n2 White only, Non-Hispanic 1 0 0 \n3 White only, Non-Hispanic 0 0 1 \n4 Black only, Non-Hispanic 0 0 1 \n\n PneumoVaxEver TetanusLast10Tdap \\\n0 1 No, did not receive any tetanus shot in the pa... \n1 1 Yes, received Tdap \n2 0 Yes, received tetanus shot but not sure what type \n3 1 Yes, received tetanus shot but not sure what type \n4 0 No, did not receive any tetanus shot in the pa... \n\n HighRiskLastYear CovidPos \n0 0 1 \n1 0 0 \n2 0 0 \n3 0 0 \n4 0 0 \n\n[5 rows x 35 columns]","text/html":"
\n | PatientID | \nState | \nSex | \nGeneralHealth | \nAgeCategory | \nHeightInMeters | \nWeightInKilograms | \nBMI | \nHadHeartAttack | \nHadAngina | \n... | \nECigaretteUsage | \nChestScan | \nRaceEthnicityCategory | \nAlcoholDrinkers | \nHIVTesting | \nFluVaxLast12 | \nPneumoVaxEver | \nTetanusLast10Tdap | \nHighRiskLastYear | \nCovidPos | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n1 | \nAlabama | \nFemale | \nFair | \nAge 75 to 79 | \n1.63 | \n84.820000 | \n32.099998 | \n0 | \n1 | \n... | \nNever used e-cigarettes in my entire life | \n1 | \nWhite only, Non-Hispanic | \n0 | \n0 | \n0 | \n1 | \nNo, did not receive any tetanus shot in the pa... | \n0 | \n1 | \n
1 | \n2 | \nAlabama | \nFemale | \nVery good | \nAge 65 to 69 | \n1.60 | \n71.669998 | \n27.990000 | \n0 | \n0 | \n... | \nNever used e-cigarettes in my entire life | \n0 | \nWhite only, Non-Hispanic | \n0 | \n0 | \n1 | \n1 | \nYes, received Tdap | \n0 | \n0 | \n
2 | \n3 | \nAlabama | \nMale | \nExcellent | \nAge 60 to 64 | \n1.78 | \n71.209999 | \n22.530001 | \n0 | \n0 | \n... | \nNever used e-cigarettes in my entire life | \n0 | \nWhite only, Non-Hispanic | \n1 | \n0 | \n0 | \n0 | \nYes, received tetanus shot but not sure what type | \n0 | \n0 | \n
3 | \n4 | \nAlabama | \nMale | \nVery good | \nAge 70 to 74 | \n1.78 | \n95.250000 | \n30.129999 | \n0 | \n0 | \n... | \nNever used e-cigarettes in my entire life | \n0 | \nWhite only, Non-Hispanic | \n0 | \n0 | \n1 | \n1 | \nYes, received tetanus shot but not sure what type | \n0 | \n0 | \n
4 | \n5 | \nAlabama | \nFemale | \nGood | \nAge 50 to 54 | \n1.68 | \n78.019997 | \n27.760000 | \n0 | \n0 | \n... | \nNever used e-cigarettes in my entire life | \n1 | \nBlack only, Non-Hispanic | \n0 | \n0 | \n1 | \n0 | \nNo, did not receive any tetanus shot in the pa... | \n0 | \n0 | \n
5 rows × 35 columns
\n\n | PatientID | \nHeightInMeters | \nWeightInKilograms | \nBMI | \nHadHeartAttack | \nHadAngina | \nHadStroke | \nHadAsthma | \nHadSkinCancer | \nHadCOPD | \n... | \nDifficultyWalking | \nDifficultyDressingBathing | \nDifficultyErrands | \nChestScan | \nAlcoholDrinkers | \nHIVTesting | \nFluVaxLast12 | \nPneumoVaxEver | \nHighRiskLastYear | \nCovidPos | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n... | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n237630.000000 | \n
mean | \n118815.500000 | \n1.704990 | \n83.667908 | \n28.691602 | \n0.055553 | \n0.061512 | \n0.041779 | \n0.148517 | \n0.085225 | \n0.078281 | \n... | \n0.148933 | \n0.034524 | \n0.067567 | \n0.426941 | \n0.545285 | \n0.342697 | \n0.531907 | \n0.407125 | \n0.042823 | \n0.295939 | \n
std | \n68598.016571 | \n0.106776 | \n21.360982 | \n6.528065 | \n0.229056 | \n0.240267 | \n0.200085 | \n0.355612 | \n0.279217 | \n0.268614 | \n... | \n0.356023 | \n0.182572 | \n0.251002 | \n0.494635 | \n0.497946 | \n0.474612 | \n0.498982 | \n0.491299 | \n0.202458 | \n0.456465 | \n
min | \n1.000000 | \n0.910000 | \n28.120001 | \n12.020000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n... | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n
25% | \n59408.250000 | \n1.630000 | \n68.040001 | \n24.280001 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n... | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n
50% | \n118815.500000 | \n1.700000 | \n81.650002 | \n27.459999 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n... | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n1.000000 | \n0.000000 | \n1.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n
75% | \n178222.750000 | \n1.780000 | \n95.250000 | \n31.900000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n0.000000 | \n... | \n0.000000 | \n0.000000 | \n0.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n0.000000 | \n1.000000 | \n
max | \n237630.000000 | \n2.410000 | \n292.570007 | \n97.650002 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n... | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n1.000000 | \n
8 rows × 26 columns
\n\n | BMI_HeartRisk | \nSmoker_COPD | \n
---|---|---|
0 | \n0.0 | \n0.0 | \n
1 | \n0.0 | \n0.0 | \n
2 | \n0.0 | \n0.0 | \n
3 | \n0.0 | \n0.0 | \n
4 | \n0.0 | \n0.0 | \n