metadata

tags:
  - image-classification
library_name: coreml
license: other
license_name: apple-ascl
license_link: LICENSE
datasets:
  - imagenet-1k

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Image Classification.

Please observe original license.

Model Details

Model Type: Image classification
Model Stats:
- Params (M): 44.1
- GMACs: 7.8
- Activations (M): 40.4
- Image size: 256 x 256
Papers:
- FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: https://arxiv.org/abs/2303.14189
Original: https://github.com/apple/ml-fastvit
Dataset: ImageNet-1k

Evaluation - Variants

Variant	Parameters	Size (MB)	Weight precision	Act. precision	Δ Pytorch acc
T8	3.6M	7.8	Float16	Float16	-0.9%
MA36	42.7M	84	Float16	Float16	-0.06%

Evaluation - Inference time

Variant	Device	OS	Inference time (ms)	Dominant compute unit
T8	iPhone 12 Pro Max	17.5	0.79	Neural Engine
T8	M3 Max	14.4	0.62	Neural Engine
MA36	iPhone 12 Pro Max	18.0	4.50	Neural Engine
MA36	M3 Max	15.0	2.99	Neural Engine

Download

Install huggingface-cli

brew install huggingface-cli

To download one of the .mlpackage folders to the models directory:

huggingface-cli download \
  --local-dir models --local-dir-use-symlinks False \
  apple/coreml-FastViT-T8

Citation

@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}