shivalikasingh commited on
Commit
aea37d2
·
1 Parent(s): 70944a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -38
README.md CHANGED
@@ -8,22 +8,36 @@ tags:
8
 
9
  ## Model description
10
 
11
- This model is built using two important architectural components proposed by Bryan Lim et al. in [Temporal Fusion Transformers (TFT) for Interpretable Multi-horizon Time Series Forecasting](https://arxiv.org/abs/1912.09363) called GRN and VSN which are very useful for structured data classification tasks.
12
 
13
- 1. Gated Residual Networks(GRN) consist of skip connections and gating layers that facilitate information flow efficiently. They have the flexibility to apply non-linear processing only where needed.
14
- 2. Variable Selection Networks(VSN) help in carefully selecting the most important features from the input by getting rid of any unnecessary noisy inputs which could harm the model's performance.
15
 
16
- **Note:** This model is not based on the whole TFT model but only uses the GRN and VSN components described in the mentioned paper demonstrating that GRN and VSNs on their own also can be very useful for structured data learning tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Intended uses
19
 
20
- This model can be used for binary classification task to determine whether a person makes over $500K a year.
21
 
22
  ## Training and evaluation data
23
 
24
  This model was trained using the [United States Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29) provided by the UCI Machine Learning Repository.
25
- The dataset contains weighted census data extracted from 1994 and 1995 Current Population Surveys conducted by the US Census Bureau.
26
- The dataset comprises of ~300K samples with 41 input features containing 7 numerical features and 34 categorical features:
 
27
 
28
  | Numerical Features | Categorical Features |
29
  | :-- | :-- |
@@ -49,7 +63,6 @@ The dataset comprises of ~300K samples with 41 input features containing 7 numer
49
  || state of previous residence |
50
  || detailed household and family stat |
51
  || detailed household summary in household |
52
- || instance weight |
53
  || migration code-change in msa |
54
  || migration code-change in reg |
55
  || migration code-move within reg |
@@ -66,42 +79,36 @@ The dataset comprises of ~300K samples with 41 input features containing 7 numer
66
  || taxable income amount |
67
  || fill inc questionnaire for veteran's admin |
68
 
 
 
69
 
70
  ## Training procedure
71
 
72
- 0. **Prepare Data:** Download the data and convert the target column *income_level* from string to integer and finally split the data into train and validation.
73
-
74
- 1. **Prepare tf.data.Dataset:** Train and validation datasets created using Step 0 are passed to a function that converts the features and labels into a tf.data.Dataset for training and evaluation.
75
 
76
- 2. **Define logic for Encoding input features:** All features are encoded while also ensuring that they all have the same dimensionality.
77
 
78
- - **Categorical Features:** are encoded using *Embedding* layer provided by Keras with output dimension of embedding equal to *encoding_size*
79
 
80
  - **Numerical Features:** are projected into a *encoding_size* dimensional vector by applying a linear transformation using *Dense* layer provided by Keras
81
-
82
- 3. **Implement the Gated Linear Unit (GLU):** consists of two Dense layers where the last last dense layer has a sigmoid activation. GLUs help in suppressing inputs that are not useful for a given task.
83
-
84
- 4. **Implement the Gated Residual Network:**
85
- - Applies Non-linear ELU tranformation on its inputs
86
- - Applies linear transformation followed by dropout
87
- - Applies GLU and adds the original inputs to the output of the GLU to perform skip (residual) connection
88
- - Applies layer normalization and produces the output
89
-
90
- 5. **Implement the Variable Selection Network:**
91
- - Applies a Gated Residual Network (GRN) which was defined in step 4 to each feature individually.
92
- - Applies a GRN for the concatenation of all features followed by a softmax to produce feature weights
93
- - Produces a weighted sum of the output of the individual GRN
94
-
95
- 6. **Create Model:**
96
- - The model will have input layers corresponding to both numerical and categorical features of the given dataset
97
- - The features received by the input layers are then encoded using the encoding logic defined in Step 2.
98
- - The encoded features pass through the Variable Selection Network(VSN)
99
- - The output produced by the VSN are passed through a final *Dense* layer with sigmoid activation to produce the final output of the model
100
-
101
- 7. **Compile, Train and Evaluate Model**: The model is compiled using Adam optimizer and since the model is meant to binary classification, the loss function chosen is Binary Cross Entropy.
102
- The model is trained for 20 epochs and batch_size of 265 with a callback for early stopping.
103
- The model performance is evaluated based on the accuracy and loss being observed on the validation set.
104
-
105
 
106
  ### Training hyperparameters
107
 
@@ -126,4 +133,10 @@ The following hyperparameters were used during training:
126
 
127
  ![Model Image](./model.png)
128
 
129
- </details>
 
 
 
 
 
 
 
8
 
9
  ## Model description
10
 
11
+ This model is built using two important architectural components proposed by Bryan Lim et al. in [Temporal Fusion Transformers (TFT) for Interpretable Multi-horizon Time Series Forecasting](https://arxiv.org/abs/1912.09363) called GRN and VSN which are very useful for structured data learning tasks.
12
 
13
+ 1. **Gated Residual Networks(GRN)**: consists of skip connections and gating layers that facilitate information flow efficiently. They have the flexibility to apply non-linear processing only where needed.
14
+ GRNs make use of [Gated Linear Units](https://arxiv.org/abs/1612.08083) (or GLUs) to suppress the input that are not relevant for a given task.
15
 
16
+ The GRN works as follows:
17
+ - It first applies Non-linear ELU tranformation on its inputs
18
+ - It then applies a linear transformation followed by dropout
19
+ - Next it applies GLU and adds the original inputs to the output of the GLU to perform skip (residual) connection
20
+ - Finally, it applies layer normalization and produces its output
21
+
22
+
23
+ 2. **Variable Selection Networks(VSN)**: help in carefully selecting the most important features from the input and getting rid of any unnecessary noisy inputs which could harm the model's performance.
24
+ The VSN works as follows:
25
+ - First, it applies a Gated Residual Network (GRN) to each feature individually.
26
+ - Then it concatenates all features and applies a GRN on the concatenated features, followed by a softmax to produce feature weights
27
+ - It produces a weighted sum of the output of the individual GRN
28
+
29
+ **Note:** This model is not based on the whole TFT model described in the mentioned paper on top but only uses its GRN and VSN components demonstrating that GRN and VSNs can be very useful on their own also for structured data learning tasks.
30
 
31
  ## Intended uses
32
 
33
+ This model can be used for binary classification task to determine whether a person makes over $500K a year or not.
34
 
35
  ## Training and evaluation data
36
 
37
  This model was trained using the [United States Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29) provided by the UCI Machine Learning Repository.
38
+ The dataset consists of weighted census data containing demographic and employment related variables extracted from 1994 and 1995 Current Population Surveys conducted by the US Census Bureau.
39
+ The dataset comprises of ~299K samples with 41 input variables and 1 target variable called *income_level*
40
+ The variable *instance_weight* is not used as an input for the model so finally the model uses 40 input features containing 7 numerical features and 33 categorical features:
41
 
42
  | Numerical Features | Categorical Features |
43
  | :-- | :-- |
 
63
  || state of previous residence |
64
  || detailed household and family stat |
65
  || detailed household summary in household |
 
66
  || migration code-change in msa |
67
  || migration code-change in reg |
68
  || migration code-move within reg |
 
79
  || taxable income amount |
80
  || fill inc questionnaire for veteran's admin |
81
 
82
+ The dataset already comes in two parts meant for training and testing.
83
+ The training dataset has 199523 samples whereas the test dataset has 99762 samples.
84
 
85
  ## Training procedure
86
 
87
+ 1. **Prepare Data:** Load the training and test datasets and convert the target column *income_level* from string to integer. The training dataset is further split into train and validation sets.
88
+ Finally, the training and validation datasets are then converted into a tf.data.Dataset meant to be used for model training and evaluation.
 
89
 
90
+ 2. **Define logic for Encoding input features:** We encode the categorical and numerical features as follows:
91
 
92
+ - **Categorical Features:** are encoded using *Embedding* layer provided by Keras. The output dimension of the embedding is equal to *encoding_size*
93
 
94
  - **Numerical Features:** are projected into a *encoding_size* dimensional vector by applying a linear transformation using *Dense* layer provided by Keras
95
+
96
+ Therefore, all the encoded features will have the same dimensionality.
97
+
98
+ 3. **Create Model:**
99
+ - The model will have input layers corresponding to both numerical and categorical features of the given dataset
100
+ - The features received by the input layers are then encoded using the encoding logic defined in Step 2 with an *encoding_size* of 16 indicating the output dimension of the encoded features.
101
+ - The encoded features pass through the Variable Selection Network(VSN). The VSN internally makes use of the GRN as well, as explained in the *Model Description* section.
102
+ - The features produced by the VSN are passed through a final *Dense* layer with sigmoid activation to produce the final output of the model indicating the probability for whether the income of a person is >500K or not.
103
+
104
+ 4. **Compile, Train and Evaluate Model**:
105
+ - Since the model is meant to binary classification, the loss function chosen was Binary Cross Entropy.
106
+ - The metric chosen for evaluating the model's performance was *accuracy*.
107
+ - The optimizer chosen was Adam with a learning rate of 0.001.
108
+ - The dropout_rate for the Dropout Layers of the GRN was 0.15
109
+ - The batch_size chosen was 265 and the model was trained for 20 epochs.
110
+ - The training was done with a Keras callback for *EarlyStopping* which means the training would be interrupted as soon as the validation metrics have stopped improving.
111
+ - Finally the performance of the model was also evaluated on the test_dataset reaching an accuracy of ~95%
 
 
 
 
 
 
 
112
 
113
  ### Training hyperparameters
114
 
 
133
 
134
  ![Model Image](./model.png)
135
 
136
+ </details>
137
+
138
+ ## Credits:
139
+
140
+ - Author: [Shivalika Singh](https://huggingface.co/shivi)
141
+ - Based on this [Keras example](https://keras.io/examples/structured_data/classification_with_grn_and_vsn) by [Khalid Salama](https://www.linkedin.com/in/khalid-salama-24403144)
142
+ - Check out the demo space [here](https://huggingface.co/spaces/shivi/structured-data-classification-grn-vsn)