Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
This is a MAE trained on Anime dataset. The main goal is to have a model efficient for image search, retrival and clustering.
|
5 |
+
|
6 |
+
There are 2 parts of this model, the encoder and decoder. The encoder encode the full images into 8x512 embedding and the masked out image into 8 (28x28/10) x 512 embedding. The decoder try to reconstruct that image.
|
7 |
+
|
8 |
+
Model arch is LocalViT small but with 16 layers instead of 12, Decoder is a simple transformers model with LocalViT style MLP.
|
9 |
+
|