billpsomas
/

vits_supervised_official_ep100

Image Classification

vision transformer

computer vision

Model card Files Files and versions Community

vits_supervised_official_ep100 / README.md

billpsomas's picture

Update README.md

6bb7d0d about 1 year ago

|

history blame contribute delete

1.05 kB

	---
	license: cc-by-4.0
	datasets:
	- imagenet-1k
	metrics:
	- accuracy
	pipeline_tag: image-classification
	language:
	- en
	tags:
	- vision transformer
	- simpool
	- computer vision
	- deep learning
	---

	# Supervised ViT-S/16 (small-sized Vision Transformer with patch size 16) model

	ViT-S official model trained on ImageNet-1k for 100 epochs. Reproduced for ICCV 2023 [SimPool](https://arxiv.org/abs/2309.06891) paper.

	SimPool is a simple attention-based pooling method at the end of network, released in this [repository](https://github.com/billpsomas/simpool/).
	Disclaimer: This model card is written by the author of SimPool, i.e. [Bill Psomas](http://users.ntua.gr/psomasbill/).

	## BibTeX entry and citation info

	```
	@misc{psomas2023simpool,
	title={Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?},
	author={Bill Psomas and Ioannis Kakogeorgiou and Konstantinos Karantzalos and Yannis Avrithis},
	year={2023},
	eprint={2309.06891},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```