Driisa commited on
Commit
2e82d86
·
verified ·
1 Parent(s): e6739f3

Upload 9 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ data/Findex_data.csv filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,12 +1,157 @@
1
- ---
2
- title: Final Submission BDS App.py
3
- emoji: 🌖
4
- colorFrom: gray
5
- colorTo: gray
6
- sdk: streamlit
7
- sdk_version: 1.39.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: My Streamlit Project
3
+ emoji: 🔥
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: streamlit
7
+ python_version: '3.10'
8
+ tags:
9
+ - streamlit
10
+ - data-visualization
11
+ app_file: app.py
12
+ sdk_version: 1.38.0
13
+ ---
14
+
15
+
16
+ # 🌍 **FINDEX Data Analysis & Prediction App** 📊
17
+
18
+ Welcome to the **FINDEX Data Analysis & Prediction App**! This app provides insights and predictions related to financial inclusion based on the Global Findex 2021 dataset from the World Bank. The app allows users to explore data distributions, statistics, and use machine learning to predict the likelihood of bank account ownership.
19
+
20
+ ## 🚀 **Table of Contents**
21
+ - [Introduction](#introduction)
22
+ - [Features](#features)
23
+ - [Installation](#installation)
24
+ - [Usage](#usage)
25
+ - [Pages Overview](#pages-overview)
26
+ - [Folder Structure](#folder-structure)
27
+ - [Contributing](#contributing)
28
+ - [License](#license)
29
+ - [Contact](#contact)
30
+
31
+ ---
32
+
33
+ ## 🌟 **Introduction**
34
+
35
+ The **FINDEX Data Analysis & Prediction App** is a multi-page Streamlit application that provides a detailed exploration of financial inclusion data. The app uses machine learning models to predict the likelihood of an individual owning a bank account based on various socioeconomic factors, including income, education, and digital payment usage.
36
+
37
+ Whether you're looking to explore demographic trends or predict financial behavior, this app delivers intuitive visualizations and powerful AI-driven insights.
38
+
39
+ ---
40
+
41
+ ## 🔥 **Features**
42
+ - **Data Visualization**: Visualize the distribution of key features, including age, income, and account ownership.
43
+ - **Statistics Overview**: View summary statistics on different demographic groups.
44
+ - **Machine Learning Predictions**: Predict the likelihood of bank account ownership using an XGBoost model.
45
+ - **SHAP Explanations**: Understand the key factors behind the predictions with SHAP visualizations.
46
+ - **Filtering Options**: Customize your analysis by filtering data by gender, education, age, and more.
47
+
48
+ ---
49
+
50
+ ## 💻 **Installation**
51
+
52
+ To run the app locally, follow these steps:
53
+
54
+ 1. **Clone the repository**:
55
+ ```bash
56
+ git clone https://github.com/Driisa/Final-submission-BDS-app.py.git
57
+ ```
58
+
59
+ 2. Navigate to the project directory:
60
+ ```bash
61
+ cd FINDEX-App
62
+ ```
63
+
64
+ 3. Set up a virtual environment (optional but recommended):
65
+ ```bash
66
+ python -m venv venv
67
+ source venv/bin/activate # On Windows use: venv\Scripts\activate
68
+ ```
69
+
70
+ 4. Install the required dependencies:
71
+ ```bash
72
+ pip install -r requirements.txt
73
+ ```
74
+
75
+ 5. Run the app:
76
+ ```bash
77
+ streamlit run app.py
78
+ ```
79
+
80
+ 6. Open your browser: Visit http://localhost:8501 to start using the app!
81
+
82
+ ---
83
+
84
+ ## 🕹 **Usage**
85
+
86
+ - **Navigate the App**: Use the sidebar to switch between different pages such as Info, Distribution, Statistics, and Prediction.
87
+ - **Input Your Data**: On the Prediction page, provide the input data for demographic and socioeconomic factors to predict bank account ownership.
88
+ - **Visualize Data**: Use the Distribution and Statistics pages to explore data distributions and key insights from the dataset.
89
+
90
+ ---
91
+
92
+ ## 📚 **Pages Overview**
93
+
94
+ 1️⃣ **Info**
95
+ Provides an introduction to the Global Findex dataset and the key variables used in the analysis.
96
+
97
+ 2️⃣ **Distribution**
98
+ Visualize the distribution of various features, including age, mobile ownership, internet access, and account ownership, with bar plots and histograms.
99
+
100
+ 3️⃣ **Statistics**
101
+ Analyze summary statistics for key demographic groups, including age and education, and view bar plots showing account ownership across income and age groups.
102
+
103
+ 4️⃣ **Prediction**
104
+ Predict whether an individual has a bank account based on inputted data. The prediction is powered by an XGBoost model, and SHAP visualizations help explain the predictions.
105
+
106
+ ---
107
+
108
+ ## 📁 **Folder Structure**
109
+
110
+ ```bash
111
+ FINDEX-App/
112
+ ├── models/
113
+ │ ├── df.joblib
114
+ │ ├── df_sample.joblib
115
+ │ ├── ohe.joblib
116
+ │ ├── scaler.joblib
117
+ │ └── xgb_clf.joblib
118
+ ├── app.py
119
+ ├── README.md
120
+ ├── requirements.txt
121
+ └── .streamlit/
122
+ ```
123
+
124
+ ## 💡 **Contributing**
125
+
126
+ Want to contribute to the project? Here’s how:
127
+
128
+ 1. **Fork the repository**.
129
+ 2. **Create a new branch** for your feature or bug fix:
130
+ ```bash
131
+ git checkout -b feature-new-feature
132
+ ```
133
+ 3. Make your changes and commit them:
134
+ ```bash
135
+ git commit -m "Added a new feature!"
136
+ ```
137
+ 4. Push the changes to your forked repo:
138
+ ```bash
139
+ git push origin feature-new-feature
140
+ ```
141
+ 5. Open a Pull Request, and let’s review your changes!
142
+
143
+ ---
144
+
145
+ ## 📜 **License**
146
+
147
+ This project is licensed under the MIT License. See the LICENSE file for more details.
148
+
149
+ ---
150
+
151
+ ## 📞 **Contact**
152
+
153
+ If you have any issues or suggestions, feel free to reach out by opening a GitHub issue. We’d love to hear from you and improve the app based on your feedback!
154
+
155
+ ---
156
+
157
+ 🎉 **Enjoy exploring financial inclusion data with the FINDEX Data Analysis & Prediction App!** 🎉
app.py ADDED
@@ -0,0 +1,392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import seaborn as sns
3
+ import matplotlib.pyplot as plt
4
+ import streamlit as st
5
+ from pathlib import Path
6
+ import os
7
+ import joblib
8
+ import shap
9
+ from streamlit_shap import st_shap
10
+ from streamlit_folium import st_folium # Import st_folium to embed Folium map in Streamlit
11
+ import folium
12
+
13
+
14
+ # Load the model, scaler, one-hot encoder, and pre-processed DataFrame
15
+ @st.cache_resource # Cache the model objects to avoid reloading on every interaction
16
+ def load_model_objects():
17
+ models_dir = os.path.join(os.getcwd(), 'models') # Adjust the 'models' folder if needed
18
+ xgb_clf = joblib.load(os.path.join(models_dir, 'xgb_clf.joblib'))
19
+ scaler = joblib.load(os.path.join(models_dir, 'scaler.joblib'))
20
+ ohe = joblib.load(os.path.join(models_dir, 'ohe.joblib'))
21
+ df = joblib.load(os.path.join(models_dir, 'df.joblib')) # Pre-processed DataFrame
22
+ df_sample = joblib.load(os.path.join(models_dir, 'df_sample.joblib')) # Sampled DataFrame
23
+
24
+ return xgb_clf, scaler, ohe, df, df_sample
25
+
26
+ # Load the model, scaler, encoder, and pre-processed DataFrame
27
+ xgb_clf, scaler, ohe, df, df_sample = load_model_objects()
28
+
29
+
30
+
31
+ # =============================================================================================================================
32
+ # Sidebar navigation
33
+ # =============================================================================================================================
34
+
35
+ st.sidebar.title("Navigation")
36
+ page = st.sidebar.radio("Go to", ["Info", "Destribution", "Statistics", "Prediction"])
37
+
38
+
39
+
40
+ # =============================================================================================================================
41
+ # Info page
42
+ # =============================================================================================================================
43
+ if page == "Info": # if the page is info then show following
44
+ st.title("Info")
45
+ st.write("Welcome to the Streamlit Dashboard of the FINDEX dataset!")
46
+
47
+ st.write("This dashboard provides insights from the final submission in Introduction to business data science. The data in this app is from the Global Findex 2021 / World Bank survey.")
48
+ st.write("The dataset from Findex contains financial inclusion data from 2021. The data covers various demographics, income, and financial behaviors across multiple countries.")
49
+
50
+
51
+ st.subheader("Understand Business Context - Problem Definition")
52
+ st.write("Based on the data this app will help to understand the financial inclusion of the respondents. The app wil adress the following questions.")
53
+ st.write("""
54
+ - Can we predict whether an individual is likely to own a bank account based on income, education, and other socioeconomic factors?
55
+ - What factors influence on having and account?
56
+ - How is the correlation between the diffenrent varibles?
57
+ """)
58
+
59
+ st.subheader("Key Variable Descriptions (df_sample)")
60
+ st.write("""
61
+ - **Account**: Binary variable indicating whether the respondent has a bank account.
62
+ - **Income**: Income quintile of the respondent.
63
+ - **Remittances**: Amount of remittances received by the respondent.
64
+ - **Education Level**: Education level of the respondent.
65
+ - **Age**: Respondent's age.
66
+ - **Gender**: Gender of the respondent.
67
+ - **Mobile Owner**: Binary variable indicating whether the respondent owns a mobile phone.
68
+ - **Internet Access**: Binary variable indicating whether the respondent has access to the internet.
69
+ - **Pay Utilities**: Binary variable indicating if the respondent uses digital payment methods for paying utilities.
70
+ - **Receive Transfers**: Binary variable indicating if the respondent receives money transfers.
71
+ - **Receive Pension**: Binary variable indicating if the respondent receives a pension.
72
+ - **Economy**: Country of the respondent.
73
+ - **Regionwb**: World Bank region of the respondent.
74
+ - **Digital Payment Usage**: Binary variable indicating if the respondent uses digital payment methods.
75
+ """)
76
+ st.write("All these variables are used to predict the account variable in the prediction page.")
77
+
78
+
79
+ # =============================================================================================================================
80
+ # Destribution page
81
+ # =============================================================================================================================
82
+ elif page == "Destribution": # if the page is destribution then show following
83
+
84
+ st.title("Visulisation of the data distribution og the data")
85
+
86
+ st.write("Here is a preview of the Age Distribution:")
87
+ def plot_age_distribution(data):
88
+ fig, ax = plt.subplots(figsize=(8, 6))
89
+ sns.histplot(data['age'], kde=True)
90
+ st.pyplot(fig)
91
+ plot_age_distribution(df)
92
+
93
+
94
+ st.write("Here is a preview of the percentage of the different features:")
95
+
96
+ # Dictionary to map numeric codes to their actual meanings
97
+ mapping_dict = {
98
+ 'mobile_owner': {1: 'Owns mobile phone', 2: 'Does not own', 3: "Don't know"},
99
+ 'internet_access': {1: 'Has access', 2: 'No access', 3: "Don't know"},
100
+ 'pay_utilities': {1: 'Paid from account', 2: 'Paid in cash', 3: 'Other method', 4: 'Did not pay'},
101
+ 'receive_transfers': {1: 'Received via account', 2: 'Received in cash', 3: 'Other method', 4: 'Did not receive'},
102
+ 'receive_pension': {1: 'Received via account', 2: 'Received in cash', 3: 'Other method', 4: 'Did not receive'},
103
+ 'education_level': {1: 'Primary or less', 2: 'Secondary', 3: 'Tertiary or more'},
104
+ 'gender': {1: 'Female', 2: 'Male'},
105
+ 'account': {1: 'Yes', 0: 'No'},
106
+ 'digital_payment_usage': {1: 'Yes', 0: 'No'}
107
+ }
108
+
109
+ # List of categorical/binary features to plot
110
+ cat_features = [
111
+ 'account', 'mobile_owner', 'internet_access',
112
+ 'pay_utilities', 'receive_transfers', 'gender',
113
+ 'education_level', 'digital_payment_usage'
114
+ ]
115
+
116
+ # Set up the figure for multiple subplots
117
+ fig, axes = plt.subplots(4, 2, figsize=(10, 20)) # 2 rows, 4 columns abd the firure size
118
+
119
+ # Flatten axes to easily iterate over them in a single loop
120
+ axes = axes.flatten() #
121
+
122
+ # Loop through features to create bar plots (instead of doing the same for each plot, we can do it once using loop)
123
+ for i, col in enumerate(cat_features):
124
+ # Create a copy of the current column and apply mapping for the plot
125
+ data_for_plot = df_sample[col].copy().replace(mapping_dict.get(col, {})) # Use copy() to avoid modifying the original data
126
+
127
+ # Calculate percentages for each category
128
+ percentage_data = data_for_plot.value_counts(normalize=True) * 100
129
+
130
+ # Plot the bar plot showing percentage distribution
131
+ sns.barplot(x=percentage_data.index, y=percentage_data.values, ax=axes[i], palette="Blues_d")
132
+
133
+ # Set plot title and labels
134
+ axes[i].set_title(f'Percentage Distribution of {col}')
135
+ axes[i].set_ylabel('Percentage (%)')
136
+ axes[i].set_xlabel(col)
137
+
138
+ # Rotate x-axis labels if there are long categories
139
+ axes[i].set_xticklabels(axes[i].get_xticklabels(), rotation=45, ha='right')
140
+
141
+ # Adjust layout for better appearance
142
+ plt.tight_layout()
143
+
144
+ # Display the plot in Streamlit
145
+ st.pyplot(fig)
146
+
147
+
148
+
149
+ # =============================================================================================================================
150
+ # Statistics page
151
+ # =============================================================================================================================
152
+ elif page == "Statistics":
153
+
154
+ # sidebar filtering settings
155
+ # Map gender and education level codes to readable labels for the select boxes
156
+ gender_mapping = {1: 'Female', 2: 'Male'}
157
+ education_level_mapping = {1: 'Primary or less', 2: 'Secondary', 3: 'Tertiary or more'}
158
+
159
+ # Create new columns for the labels in the df_sample dataframe
160
+ df_sample['gender_label'] = df_sample['gender'].map(gender_mapping)
161
+ df_sample['education_level_label'] = df_sample['education_level'].map(education_level_mapping)
162
+
163
+ # Sidebar economy dropdown
164
+ selected_economy = st.sidebar.multiselect('Select Economy', df_sample['economy'].unique(), default=[])
165
+
166
+ # Sidebar gender dropdown (using gender_label column)
167
+ selected_genders = st.sidebar.multiselect('Select Gender', df_sample['gender_label'].unique(), default=[])
168
+
169
+ # Sidebar education level dropdown (using education_level_label column)
170
+ selected_educational_level = st.sidebar.multiselect('Select educational level', df_sample['education_level_label'].unique(), default=[])
171
+
172
+ # Sidebar Age Slider
173
+ st.sidebar.header('Filter by Age')
174
+ age_range = st.sidebar.slider('Select Age Range', int(df_sample['age'].min()), int(df_sample['age'].max()), (15, 99))
175
+
176
+ # Initial filter - apply all conditions cumulatively
177
+ filtered_data = df_sample[df_sample['age'].between(age_range[0], age_range[1])]
178
+
179
+ # Apply economy filter if selections are made
180
+ if selected_economy:
181
+ filtered_data = filtered_data[filtered_data['economy'].isin(selected_economy)]
182
+
183
+ # Apply gender filter based on the gender_label column
184
+ if selected_genders:
185
+ filtered_data = filtered_data[filtered_data['gender_label'].isin(selected_genders)]
186
+
187
+ # Apply educational level filter based on the education_level_label column
188
+ if selected_educational_level:
189
+ filtered_data = filtered_data[filtered_data['education_level_label'].isin(selected_educational_level)]
190
+
191
+
192
+ st.title("Statistics Page")
193
+ # Check if filtered data is not empty and calculate statistics, otherwise use "N/A"
194
+ if not filtered_data.empty:
195
+ mean_age = f"{filtered_data['age'].mean():.2f}"
196
+ median_age = f"{filtered_data['age'].median():.2f}"
197
+ max_age = f"{filtered_data['age'].max():.2f}"
198
+ min_age = f"{filtered_data['age'].min():.2f}"
199
+ else:
200
+ mean_age = median_age = max_age = min_age = "N/A"
201
+
202
+ # Display the statistics in columns
203
+ st.subheader('Age Statistics')
204
+ col1, col2, col3, col4 = st.columns(4)
205
+
206
+ col1.metric('Mean Age', mean_age)
207
+ col2.metric('Median Age', median_age)
208
+ col3.metric('Max Age', max_age)
209
+ col4.metric('Min Age', min_age)
210
+
211
+
212
+
213
+ # Add your subheader
214
+ st.subheader("Boxplot of Age")
215
+
216
+ # Boxplot before applying the cap and hurdle on age
217
+ st.write("Boxplot of Age - figure showing the distribution")
218
+ plt.figure(figsize=(8, 4)) # Define the size of the figure
219
+ sns.boxplot(x='age', data=filtered_data) # Create a boxplot based on "age"
220
+ plt.title("Boxplot of Age") # Title of the plot
221
+ st.pyplot(plt) # Display the plot in Streamlit
222
+
223
+
224
+ # If filtered data is not empty, continue with analysis
225
+ if not filtered_data.empty:
226
+ # Barplot: Account Ownership Distribution by Education Level
227
+ st.subheader('Account Ownership Distribution by Education Level')
228
+
229
+ # Create a crosstab to show the distribution
230
+ education_account_dist = pd.crosstab(filtered_data['education_level'], filtered_data['account'], normalize='index') * 100
231
+
232
+ # Rename columns to be more descriptive
233
+ education_account_dist.columns = ['No Account (%)', 'Has Account (%)']
234
+
235
+ # Bar plot for education level distribution
236
+ fig, ax = plt.subplots(figsize=(10, 6))
237
+ education_account_dist.plot(kind='bar', stacked=True, color=['#3498db', '#2ecc71'], ax=ax)
238
+
239
+ ax.set_xlabel('Education Level', fontsize=12)
240
+ ax.set_ylabel('Percentage of Account Ownership (%)', fontsize=12)
241
+ ax.set_title('Account Ownership by Education Level', fontsize=14)
242
+ ax.legend(title='Account Ownership', loc='upper right')
243
+ plt.xticks(rotation=45, ha='right')
244
+
245
+ # Display the plot
246
+ st.pyplot(fig)
247
+
248
+
249
+
250
+ # Barplot: Income Distribution by Account Ownership
251
+ st.subheader('Income Distribution by Account Ownership')
252
+
253
+ # Create a crosstab to show the distribution
254
+ income_account_dist = pd.crosstab(filtered_data['income'], filtered_data['account'], normalize='index') * 100
255
+
256
+ # Rename columns to be more descriptive
257
+ income_account_dist.columns = ['No Account (%)', 'Has Account (%)']
258
+
259
+ # Bar plot for income quintile distribution
260
+ fig, ax = plt.subplots(figsize=(10, 6))
261
+ income_account_dist.plot(kind='bar', stacked=True, color=['#3498db', '#2ecc71'], ax=ax)
262
+
263
+ ax.set_xlabel('Income Quintile', fontsize=12)
264
+ ax.set_ylabel('Percentage of Account Ownership (%)', fontsize=12)
265
+ ax.set_title('Account Ownership by Income Quintile', fontsize=14)
266
+ ax.legend(title='Account Ownership', loc='upper right')
267
+ plt.xticks(rotation=45, ha='right')
268
+
269
+ # Display the plot
270
+ st.pyplot(fig)
271
+
272
+
273
+
274
+ # Barplot: Percentage of People Having an Account by Age Group
275
+ st.subheader('Percentage of People Having an Account by Age Group')
276
+
277
+ # Calculate the proportion of people having an account in each age group
278
+ account_by_age = filtered_data.groupby('age_group')['account'].mean().reset_index()
279
+ account_by_age['account'] = (account_by_age['account'] * 100).round(2)
280
+
281
+ # Create the bar plot using Matplotlib and Seaborn
282
+ fig, ax = plt.subplots(figsize=(10, 6))
283
+ sns.barplot(x='age_group', y='account', data=account_by_age, palette="Blues_d", ax=ax)
284
+ ax.set_xlabel('Age Group', fontsize=12)
285
+ ax.set_ylabel('Percentage of Account Ownership (%)', fontsize=12)
286
+ ax.set_title('Percentage of People with an Account by Age Group', fontsize=14)
287
+
288
+ # Add values on top of each bar
289
+ for index, value in enumerate(account_by_age['account']):
290
+ ax.text(index, value + 1, f'{value}%', ha='center', fontsize=10)
291
+
292
+ # Rotate x-axis labels for readability
293
+ ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')
294
+
295
+ # Display the plot in Streamlit
296
+ st.pyplot(fig)
297
+
298
+ else:
299
+ st.write("No data available for the selected filters.")
300
+
301
+
302
+ # Display filtered data
303
+ st.write("You can download the filtered data here")
304
+ st.dataframe(filtered_data)
305
+
306
+
307
+
308
+ # Prediction page
309
+ # =============================================================================================================================
310
+ elif page == "Prediction":
311
+ st.title("Prediction Page")
312
+
313
+ # Get valid categories for economy and regionwb from the OneHotEncoder
314
+ economy_categories = ohe.categories_[0]
315
+ regionwb_categories = ohe.categories_[1]
316
+
317
+ # Create SHAP explainer
318
+ explainer = shap.TreeExplainer(xgb_clf)
319
+
320
+ # App description
321
+ with st.expander("What's this app?"):
322
+ st.markdown("""
323
+ This app predicts whether an individual has a bank account based on their demographic and socioeconomic data.
324
+ Using advanced AI models trained on relevant data, we provide insights into financial inclusion.
325
+ Explore the SHAP explanations to understand the key factors behind the predictions!
326
+ """)
327
+
328
+ st.subheader('Input Your Data')
329
+
330
+ # User input section
331
+ col1, col2 = st.columns(2)
332
+
333
+ with col1:
334
+ inc_q = st.selectbox("Income Quintile", options=[1, 2, 3, 4, 5])
335
+ remittances = st.selectbox("Receives Remittances", options=[1, 2, 3, 4, 5, 6],
336
+ format_func=lambda x: ['Via Account', 'Via MTO', 'Cash Only', 'Other Methods', 'None', 'Don’t Know'][x-1])
337
+ educ = st.selectbox("Education Level", options=[1, 2, 3],
338
+ format_func=lambda x: ['Primary or Less', 'Secondary', 'Tertiary'][x-1])
339
+ age = st.slider("Age", 18, 100, 30)
340
+ female = st.selectbox("Gender", options=[1, 2], format_func=lambda x: 'Female' if x == 1 else 'Male')
341
+
342
+ with col2:
343
+ mobileowner = st.selectbox("Owns Mobile Phone", options=[1, 2, 3, 4],
344
+ format_func=lambda x: ['Yes', 'No', 'Don’t Know', 'Refused'][x-1])
345
+ internetaccess = st.selectbox("Has Internet Access", options=[1, 2, 3, 4],
346
+ format_func=lambda x: ['Yes', 'No', 'Don’t Know', 'Refused'][x-1])
347
+ pay_utilities = st.selectbox("Utility Payment Method", options=[1, 2, 3, 4, 5],
348
+ format_func=lambda x: ['Account', 'Cash', 'Other', 'None', 'Don’t Know'][x-1])
349
+ receive_transfers = st.selectbox("Government Transfer Method", options=[1, 2, 3, 4, 5],
350
+ format_func=lambda x: ['Account', 'Cash', 'Other', 'None', 'Don’t Know'][x-1])
351
+ receive_pension = st.selectbox("Receives Pension", options=[1, 2, 3, 4, 5],
352
+ format_func=lambda x: ['Account', 'Cash', 'Other', 'None', 'Don’t Know'][x-1])
353
+ economy = st.selectbox("Economy", options=economy_categories) # Dynamically populated
354
+ regionwb = st.selectbox("World Bank Region", options=regionwb_categories) # Dynamically populated
355
+
356
+ # Prediction button
357
+ if st.button('Predict Bank Account Ownership 🚀'):
358
+ # Prepare categorical and numerical features
359
+ cat_features = pd.DataFrame({'economy': [economy], 'regionwb': [regionwb]})
360
+ cat_encoded = pd.DataFrame(ohe.transform(cat_features).todense(), columns=ohe.get_feature_names_out(['economy', 'regionwb']))
361
+
362
+ num_features = pd.DataFrame({
363
+ 'inc_q': [inc_q],
364
+ 'remittances': [remittances],
365
+ 'educ': [educ],
366
+ 'age': [age],
367
+ 'female': [female],
368
+ 'mobileowner': [mobileowner],
369
+ 'internetaccess': [internetaccess],
370
+ 'pay_utilities': [pay_utilities],
371
+ 'receive_transfers': [receive_transfers],
372
+ 'receive_pension': [receive_pension]
373
+ })
374
+
375
+ # Scale numerical features
376
+ num_scaled = pd.DataFrame(scaler.transform(num_features), columns=num_features.columns)
377
+
378
+ # Combine categorical and numerical features
379
+ features = pd.concat([num_scaled, cat_encoded], axis=1)
380
+
381
+ # Make prediction
382
+ prediction = xgb_clf.predict(features)[0]
383
+
384
+ # Display prediction
385
+ st.metric(label="Bank Account Prediction", value='Has Account' if prediction == 1 else 'No Account')
386
+
387
+ # SHAP explanation
388
+ st.subheader('Factors Behind the Prediction 🤖')
389
+ shap_values = explainer.shap_values(features)
390
+ st_shap(shap.force_plot(explainer.expected_value, shap_values[0], features), height=400, width=600)
391
+
392
+
data/Findex_data.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98ee1367d02f92b04d0933584a4620516b90ed5f9c554f867fa5037f3f721f7a
3
+ size 40174289
models/df.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d4dc76f85d2edff3a7fc698defeecc88ef65b4dc4a6ba5627e577e60bc069c2
3
+ size 15510187
models/df_sample.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df1a6247c842ca48c9dacc1e563bf9202eaaf4f6e1fc33b726c6aeaec2199065
3
+ size 559691
models/ohe.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:293ef71904d41f5f901442ae0f4d96e7496e48a2cf424ce2682601d2cf5bfe41
3
+ size 3789
models/scaler.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d1905891aa05668b71bfe61f2b9b425083b87120bbc0c0802414840b04ffc29
3
+ size 1319
models/xgb_clf.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f5ae910ddf664d98dad33a8d49e1c0d0a958f2d272a9835c01f9104b4cbce62
3
+ size 173738
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.38.0
2
+ pandas==1.5.3
3
+ seaborn==0.12.2
4
+ joblib==1.2.0
5
+ shap==0.41.0
6
+ streamlit-shap==0.0.7
7
+ plotly==5.10.0
8
+ folium==0.14.0
9
+ streamlit-folium==0.11.0
10
+ xgboost==1.7.5