MBZUAI
/

swiftformer-xs

Image Classification

Inference Endpoints

Model card Files Files and versions Community

swiftformer-xs / README.md

shehan97's picture

Update README.md

0825bac over 1 year ago

|

1.96 kB

	---
	datasets:
	- imagenet-1k
	library_name: transformers
	pipeline_tag: image-classification
	---

	# SwiftFormer

	## Model description

	The SwiftFormer model was proposed in [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan.

	SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called 'SwiftFormer' is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.

	## Intended uses & limitations




	## How to use


	import requests
	from PIL import Image

	url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
	image = Image.open(requests.get(url, stream=True).raw)

	from transformers import ViTImageProcessor
	processor = ViTImageProcessor.from_pretrained('shehan97/swiftformer-xs')
	inputs = processor(images=image, return_tensors="pt")


	from transformers.models.swiftformer import SwiftFormerForImageClassification
	new_model = SwiftFormerForImageClassification.from_pretrained('shehan97/swiftformer-xs')

	output = new_model(inputs['pixel_values'], output_hidden_states=True)
	logits = output.logits
	predicted_class_idx = logits.argmax(-1).item()
	print("Predicted class:", new_model.config.id2label[predicted_class_idx])


	## Limitations and bias

	## Training data

	The classification model is trained on the ImageNet-1K dataset.


	## Training procedure

	## Evaluation results