Spaces:

Feiiisal
/

Streamlit_Income_Classification

Sleeping

App Files Files Community

Feiiisal commited on Jan 28, 2024

Commit

e2e32ef

1 Parent(s): c6e9cf0

Final Updates

Browse files

Files changed (5) hide show

.streamlit/config.toml +9 -0
app.py +7 -6
pipeline.pkl +2 -2
rfc_model.pkl +2 -2
transformers.py +4 -0

.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,9 @@

+[global]
+developmentMode = false
+[theme]
+base = "light"
+primaryColor = "#AEDFF7"
+backgroundColor = "#FFFFFF"
+secondaryBackgroundColor = "#AEDFF7"
+textColor = "#000000"

app.py CHANGED Viewed

@@ -2,6 +2,7 @@ import streamlit as st
 import pandas as pd
 import pickle
 import os
 # Load the model and encoder
 SRC = os.path.abspath('.')
@@ -82,20 +83,20 @@ if options == "Prediction":
     losses = st.number_input("Losses", min_value=0)
     stocks_status = st.number_input("Stocks Status", min_value=0)
     citizenship = st.selectbox("Citizenship", ['citizen', 'foreigner'])
-    importance_of_record = st.number_input("Importance of Record", min_value=0.0, format='%f')
     if st.button('Predict Income Level'):
         input_data = pd.DataFrame([[
             age, gender, education, worker_class, marital_status, race, is_hispanic, employment_commitment,
             employment_stat, wage_per_hour, working_week_per_year, industry_code, industry_code_main, occupation_code,
             occupation_code_main, total_employed, household_summary, vet_benefit, tax_status, gains, losses,
-            stocks_status, citizenship, importance_of_record
         ]], columns=[
             'age', 'gender', 'education', 'worker_class', 'marital_status', 'race', 'is_hispanic',
             'employment_commitment', 'employment_stat', 'wage_per_hour', 'working_week_per_year',
             'industry_code', 'industry_code_main', 'occupation_code', 'occupation_code_main', 'total_employed',
             'household_summary', 'vet_benefit', 'tax_status', 'gains', 'losses', 'stocks_status',
-            'citizenship', 'importance_of_record'
         ])
         # Preprocess the input data through the pipeline before making predictions
@@ -120,10 +121,10 @@ elif options == "Model Information":
           - The Random Forest is a versatile and robust machine learning method that combines multiple decision trees to produce more accurate and stable predictions. It's known for its high accuracy, ability to handle large datasets with higher dimensionality, and its robustness to overfitting.
         - **Training Data:**
-          - Our model is trained on comprehensive census data, encompassing a wide range of features such as age, education, marital status, race, occupation, and more. This rich dataset ensures a nuanced understanding of the socio-economic factors influencing income levels.
-        - **Accuracy:** 94%
-          - With an accuracy of 94%, our model stands as a reliable predictor, demonstrating its effectiveness in understanding and categorizing income levels.
         - **What It Aims to Solve:**
           - **Economic Research:** Assists in socio-economic studies, understanding income distribution, and identifying key factors influencing income levels.

 import pandas as pd
 import pickle
 import os
+from transformers import log_transform
 # Load the model and encoder
 SRC = os.path.abspath('.')
     losses = st.number_input("Losses", min_value=0)
     stocks_status = st.number_input("Stocks Status", min_value=0)
     citizenship = st.selectbox("Citizenship", ['citizen', 'foreigner'])
     if st.button('Predict Income Level'):
         input_data = pd.DataFrame([[
             age, gender, education, worker_class, marital_status, race, is_hispanic, employment_commitment,
             employment_stat, wage_per_hour, working_week_per_year, industry_code, industry_code_main, occupation_code,
             occupation_code_main, total_employed, household_summary, vet_benefit, tax_status, gains, losses,
+            stocks_status, citizenship
         ]], columns=[
             'age', 'gender', 'education', 'worker_class', 'marital_status', 'race', 'is_hispanic',
             'employment_commitment', 'employment_stat', 'wage_per_hour', 'working_week_per_year',
             'industry_code', 'industry_code_main', 'occupation_code', 'occupation_code_main', 'total_employed',
             'household_summary', 'vet_benefit', 'tax_status', 'gains', 'losses', 'stocks_status',
+            'citizenship'
         ])
         # Preprocess the input data through the pipeline before making predictions
           - The Random Forest is a versatile and robust machine learning method that combines multiple decision trees to produce more accurate and stable predictions. It's known for its high accuracy, ability to handle large datasets with higher dimensionality, and its robustness to overfitting.
         - **Training Data:**
+          - The model is trained on comprehensive census data, encompassing a wide range of features such as age, education, marital status, race, occupation, and more. This rich dataset ensures a nuanced understanding of the socio-economic factors influencing income levels.
+        - **F1 Score:** 98%
+          - With an F1 score of 98%, the model stands as a reliable predictor, demonstrating its effectiveness in understanding and categorizing income levels.
         - **What It Aims to Solve:**
           - **Economic Research:** Assists in socio-economic studies, understanding income distribution, and identifying key factors influencing income levels.

pipeline.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dd28c23cc70beba35906a28c1b6937630ffb3bb4a7c5e8cb8276f66a33eb60e4
-size 4811

 version https://git-lfs.github.com/spec/v1
+oid sha256:5798dfdbe5793f5277903f528f13beb821822483d09768f514183eab7a6c7335
+size 5296

rfc_model.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cd6ad029aa941d54353a585b399b77c15380a06cfa80a2c81faf9697effe0aac
-size 267723561

 version https://git-lfs.github.com/spec/v1
+oid sha256:28b7b8c02725a3b1914fc171cdf0a03ac31360a2bc55515832273809804203ef
+size 353869321

transformers.py ADDED Viewed

	@@ -0,0 +1,4 @@

+import numpy as np
+def log_transform(x):
+    return np.log(x + 1)