keras-io
/

video-vision-transformer

Model card Files Files and versions Community

pablorodriper commited on Feb 21, 2022

Commit

99a76b8

·

1 Parent(s): 55e85e6

Update README.md

Files changed (1) hide show

README.md +28 -1

README.md CHANGED Viewed

@@ -18,4 +18,31 @@ This repo contains the model [to this Keras example on Video Vision Transformer]
 This example implements [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al., a pure Transformer-based model for video classification. The authors propose a novel embedding scheme and a number of Transformer variants to model video clips.
 ## Datasets
-We use the [MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification](https://medmnist.com/) dataset.

 This example implements [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al., a pure Transformer-based model for video classification. The authors propose a novel embedding scheme and a number of Transformer variants to model video clips.
 ## Datasets
+We use the [MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification](https://medmnist.com/) dataset.
+## Training Parameters
+```
+# DATA
+DATASET_NAME = "organmnist3d"
+BATCH_SIZE = 32
+AUTO = tf.data.AUTOTUNE
+INPUT_SHAPE = (28, 28, 28, 1)
+NUM_CLASSES = 11
+# OPTIMIZER
+LEARNING_RATE = 1e-4
+WEIGHT_DECAY = 1e-5
+# TRAINING
+EPOCHS = 80
+# TUBELET EMBEDDING
+PATCH_SIZE = (8, 8, 8)
+NUM_PATCHES = (INPUT_SHAPE[0] // PATCH_SIZE[0]) ** 2
+# ViViT ARCHITECTURE
+LAYER_NORM_EPS = 1e-6
+PROJECTION_DIM = 128
+NUM_HEADS = 8
+NUM_LAYERS = 8
+```