shivalikasingh commited on
Commit
22a528a
·
1 Parent(s): 79a7cee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -4
README.md CHANGED
@@ -7,18 +7,97 @@ tags:
7
 
8
  ## Model description
9
 
10
- More information needed
11
 
12
- ## Intended uses & limitations
 
13
 
14
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Training and evaluation data
17
 
18
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Training procedure
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ### Training hyperparameters
23
 
24
  The following hyperparameters were used during training:
 
7
 
8
  ## Model description
9
 
10
+ ShiftViT is a variation of the Vision Transformer (ViT) where the attention operation has been swapped with a shifting operation.
11
 
12
+ ShiftViT model was proposed as part of the paper [When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism](https://arxiv.org/abs/2201.10801).
13
+ Vision Transformers have lately become very popular for computer vision problems and a lot researchers attribute their success to the attention layers. The authors of the ShiftViT paper have tried to show via the ShiftViT model that even without the attention operation, ViTs can reach SoTA results.
14
 
15
+ ## Model Architecture
16
+
17
+ The architecture for ShiftViT is inspired by the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)
18
+
19
+ | ![ShiftViT Architecture](https://i.imgur.com/CHU40HX.png) |
20
+ | :--: |
21
+ | Figure 1: The entire architecutre of ShiftViT.
22
+ [Source](https://arxiv.org/abs/2201.10801) |
23
+
24
+ Here the authors propose a modular architecture with 4 stages. Each stage works on its own spatial size, creating a hierarchical architecture.
25
+
26
+
27
+ Each stage in the ShiftViT architecture comprises of a Shift Block as shown in Fig 2.
28
+
29
+ | ![ShiftViT block](https://i.imgur.com/IDe35vo.gif) |
30
+ | :--: |
31
+ | Figure 2: From the Model to a Shift Block. |
32
+
33
+ The Shift Block as shown in Fig. 3, comprises of the following:
34
+
35
+ Shift Operation
36
+ Linear Normalization
37
+ MLP Layer: stack of densely-connected layers
38
+
39
+
40
+ 1. Split the channels
41
+ 2. Select each of the first four spilts and shift and pad them in the respective directions.
42
+ 3. After shifting and padding, we concatenate the channel back.
43
+
44
+ | ![Manim rendered animation for shift operation](https://i.imgur.com/PReeULP.gif) |
45
+ | :--: |
46
+ | Figure 4: The TensorFlow style shifting |
47
+
48
+ ## Intended uses
49
+
50
+ The ShiftViT model architecture has been designed to
51
 
52
  ## Training and evaluation data
53
 
54
+ The dataset used for training the model is CIFAR-10.
55
+ The CIFAR-10 dataset is a popular dataset used for image classification. It contains images belonging to the following 10 classes:
56
+
57
+ | Classes |
58
+ | :-- |
59
+ | airplane |
60
+ | automobile |
61
+ | bird |
62
+ | cat |
63
+ | deer |
64
+ | dog |
65
+ | frog |
66
+ | horse |
67
+ | ship |
68
+ | truck |
69
+
70
+ No. of samples used for training and evaluation are:
71
+ - Training samples: 40000
72
+ - Validation samples: 10000
73
+ - Testing samples: 10000
74
 
75
  ## Training procedure
76
 
77
+ 1. Data Preparation:
78
+
79
+ - Data Augmentation: The augmentation steps include: rescaling, resizing, cropping and horizontal flipping.
80
+
81
+ 2. Building the ShiftViT Model:
82
+
83
+ -
84
+
85
+ 3. Model Training:
86
+
87
+ The model is then trained using the following config:
88
+
89
+ | Training Config | Value |
90
+ | :-- | :-- |
91
+ | Optimizer | Adam |
92
+ | Loss Function | sparse_categorical_crossentropy|
93
+ | Metric | Accuracy |
94
+ | Epochs | 5 |
95
+
96
+ 4. Model Testing:
97
+
98
+ The model is tested on the test data post training achieving an accuracy of ~90%.
99
+
100
+
101
  ### Training hyperparameters
102
 
103
  The following hyperparameters were used during training: