--- tags: - image-classification library_name: coreml license: other license_name: apple-ascl license_link: LICENSE datasets: - imagenet-1k --- # FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization Image Classification. ![Image Classification](FastViT_2x.png) Please observe [original license](https://github.com/apple/ml-fastvit/blob/8af5928238cab99c45f64fc3e4e7b1516b8224ba/LICENSE). ## Model Details - **Model Type:** Image classification - **Model Stats:** - Params (M): 44.1 - GMACs: 7.8 - Activations (M): 40.4 - Image size: 256 x 256 - **Papers:** - FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: https://arxiv.org/abs/2303.14189 - **Original:** https://github.com/apple/ml-fastvit - **Dataset:** ImageNet-1k ## Evaluation - Variants | Variant | Parameters | Size (MB) | Weight precision | Act. precision | Δ Pytorch acc | | ------------------------------------------------------- | ---------: | --------: | ---------------- | -------------- | ------------- | | [T8](https://huggingface.co/apple/FastViTT8F16.mlpackage) | 3.6M | 7.8 | Float16 | Float16 | -0.9% | | [MA36](https://huggingface.co/apple/FastViTMA36F16.mlpackage) | 42.7M | 84 | Float16 | Float16 | -0.06% | ## Evaluation - Inference time | Variant | Device | OS | Inference time (ms) | Dominant compute unit | | ------- | -------------------- | ---- | ------------------: | --------------------- | | T8 | iPhone 12 Pro Max | 17.5 | 0.79 | Neural Engine | | T8 | M3 Max | 14.4 | 0.62 | Neural Engine | | MA36 | iPhone 12 Pro Max | 18.0 | 4.50 | Neural Engine | | MA36 | M3 Max | 15.0 | 2.99 | Neural Engine | ## Download Install `huggingface-cli` ```bash brew install huggingface-cli ``` To download one of the `.mlpackage` folders to the `models` directory: ```bash huggingface-cli download \ --local-dir models --local-dir-use-symlinks False \ apple/coreml-FastViT-T8 ``` ## Citation ```bibtex @inproceedings{vasufastvit2023, author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan}, title = {FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, year = {2023} } ```