File size: 1,046 Bytes

6e8d2a2
 
6bb7d0d
 
 
 
 
 
 
 
 
 
 
 
6e8d2a2
6bb7d0d

---
license: cc-by-4.0
datasets:
- imagenet-1k
metrics:
- accuracy
pipeline_tag: image-classification
language:
- en
tags:
- vision transformer
- simpool
- computer vision
- deep learning
---

# Supervised ViT-S/16 (small-sized Vision Transformer with patch size 16) model

ViT-S official model trained on ImageNet-1k for 100 epochs. Reproduced for ICCV 2023 [SimPool](https://arxiv.org/abs/2309.06891) paper.

SimPool is a simple attention-based pooling method at the end of network, released in this [repository](https://github.com/billpsomas/simpool/).
Disclaimer: This model card is written by the author of SimPool, i.e. [Bill Psomas](http://users.ntua.gr/psomasbill/).

## BibTeX entry and citation info

```
@misc{psomas2023simpool,
      title={Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?}, 
      author={Bill Psomas and Ioannis Kakogeorgiou and Konstantinos Karantzalos and Yannis Avrithis},
      year={2023},
      eprint={2309.06891},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```