|
--- |
|
datasets: |
|
- imagenet-1k |
|
library_name: transformers |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
# SwiftFormer |
|
|
|
## Model description |
|
|
|
The SwiftFormer model was proposed in [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. |
|
|
|
SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called 'SwiftFormer' is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2. |
|
|
|
## Intended uses & limitations |
|
|
|
|
|
|
|
|
|
## How to use |
|
|
|
|
|
import requests |
|
from PIL import Image |
|
|
|
url = 'http://images.cocodataset.org/val2017/000000039769.jpg' |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
from transformers import ViTImageProcessor |
|
processor = ViTImageProcessor.from_pretrained('shehan97/swiftformer-xs') |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
|
|
from transformers.models.swiftformer import SwiftFormerForImageClassification |
|
new_model = SwiftFormerForImageClassification.from_pretrained('shehan97/swiftformer-xs') |
|
|
|
output = new_model(inputs['pixel_values'], output_hidden_states=True) |
|
logits = output.logits |
|
predicted_class_idx = logits.argmax(-1).item() |
|
print("Predicted class:", new_model.config.id2label[predicted_class_idx]) |
|
|
|
|
|
## Limitations and bias |
|
|
|
## Training data |
|
|
|
The classification model is trained on the ImageNet-1K dataset. |
|
|
|
|
|
## Training procedure |
|
|
|
## Evaluation results |
|
|
|
|
|
|
|
|