nickpai
/

vit-layer6-32-cifar10

vision-transformer

computer-vision

machine-learning

Model card Files Files and versions Community

vit-layer6-32-cifar10 / README.md

nickpai's picture

Update README.md

82c2017 verified 3 months ago

|

history blame contribute delete

1.62 kB

	---
	license: mit
	tags:
	- vision-transformer
	- ViT
	- classification
	- cifar10
	- computer-vision
	- deep-learning
	- machine-learning
	---

	# ViT-Classification-CIFAR10

	## Model Description

	This model is a Vision Transformer (ViT) architecture trained on the CIFAR-10 dataset for image classification. It is trained from scratch without pre-training on a larger dataset.

	Metrics:

	* Test accuracy: 78.31%
	* Test loss: 0.6296

	## Training Configuration

	Hardware: NVIDIA RTX 3090

	Training parameters:

	* Epochs: 200
	* Batch size: 4096
	* Input size: 3x32x32
	* Patch size: 4
	* Sequence length: 8*8
	* Embed size: 128
	* Num of layers: 6
	* Num of heads: 4
	* Forward multiplier: 2
	* Dropout: 0.1
	* Optimizer: AdamW

	## Intended Uses & Limitations

	This model is intended for practice purposes and exploration of ViT architectures on the CIFAR-10 dataset. It can be used for image classification tasks on similar datasets.

	Limitations:

	* This model is trained on a relatively small dataset (CIFAR-10) and might not generalize well to unseen data.
	* Training is done without fine-tuning, potentially limiting its performance compared to a fine-tuned model.
	* Training is performed on a single RTX 3090.

	## Training Data

	The model is trained on the CIFAR-10 dataset, containing 60,000 32x32 color images in 10 classes.

	* Training set: 50,000 images
	* Test set: 10,000 images

	Data Source: [https://paperswithcode.com/dataset/cifar-10](https://paperswithcode.com/dataset/cifar-10)

	## Documentation

	* GitHub Repository: [ViT-Classification-CIFAR10](https://github.com/nick8592/ViT-Classification-CIFAR10.git)