Uploaded the PyTorch model weights
Browse files# <ins>FaKe-ViT-B/16: Robust and Fast AI-Generated Image Detection using Vision Transformer(ViT-B/16):
FaKe-ViT-B/16 is Finetuned ViT-Base model for the task of **classifying AI-generated images/Fake images and Real images.**
This is a **90M param transformer model** that is *Extremely Robust **(88% Accuracy on test set)**.It also **Generalizes well** on images from **newer diffusion models** and has **Fast (~5.4 sec/img) inference.**
The intuition behind using ViT for this task is due to the Transformer architecture's ability to adapt to and capture **global features and global contexts**, just like Transformer language models like BERT. This is because we are not detecting a specific image or such but looking for small nuances/difference within real and fake images that are produced by these diffusion models.
**Here's the demo:** https://huggingface.co/spaces/Zappy586/Fake-ViT
And here's the **Colab notebook** where I tried to train the model from scratch by replicating the paper: https://github.com/zappy586/FAKE-ViT/blob/main/ViT_Paper_replication.ipynb
The **original ViT paper**: https://arxiv.org/abs/2010.11929
- FaKe-ViT-B16.pth +3 -0
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3a2d9f5edce776c627c3797b1f1a6be5d243a188ce39b9546da2ee031b363c30
|
3 |
+
size 343286022
|