Spaces:
Runtime error
Runtime error
# TF-Vision Model Garden | |
⚠️ Disclaimer: Checkpoints are based on training with publicly available datasets. Some datasets contain limitations, including non-commercial use limitations. | |
Please review the terms and conditions made available by third parties before using | |
the datasets provided. Checkpoints are licensed under | |
[Apache 2.0](https://github.com/tensorflow/models/blob/master/LICENSE). | |
⚠️ Disclaimer: Datasets hyperlinked from this page are not owned or distributed | |
by Google. Such datasets are made available by third parties. Please review the | |
terms and conditions made available by the third parties before using the data. | |
## Introduction | |
TF-Vision modeling library for computer vision provides a collection of | |
baselines and checkpoints for image classification, object detection, and | |
segmentation. | |
## Image Classification | |
### ImageNet Baselines | |
#### ResNet models trained with vanilla settings | |
* Models are trained from scratch with batch size 4096 and 1.6 initial learning | |
rate. | |
* Linear warmup is applied for the first 5 epochs. | |
* Models trained with l2 weight regularization and ReLU activation. | |
| Model | Resolution | Epochs | Top-1 | Top-5 | Download | | |
| ------------ |:-------------:|--------:|--------:|--------:|---------:| | |
| ResNet-50 | 224x224 | 90 | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | | |
| ResNet-50 | 224x224 | 200 | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet/resnet-50-i224.tar.gz) | | |
| ResNet-101 | 224x224 | 200 | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet/resnet-101-i224.tar.gz) | | |
| ResNet-152 | 224x224 | 200 | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet/resnet-152-i224.tar.gz) | | |
#### ResNet-RS models trained with various settings | |
We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image | |
classification models with features: | |
* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS | |
adopts ReLU activation in the paper.) | |
* Regularization methods include Random Augment, 4e-5 weight decay, stochastic | |
depth, label smoothing , and dropout. | |
* New training methods including a 350-epoch schedule, cosine learning rate , and | |
EMA. | |
* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification). | |
| Model | Resolution | Params (M) | Top-1 | Top-5 | Download | | |
| --------- | :--------: | ---------: | ----: | ----: | --------:| | |
| ResNet-RS-50 | 160x160 | 35.7 | 79.1 | 94.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) | | |
| ResNet-RS-101 | 160x160 | 63.7 | 80.2 | 94.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) | | |
| ResNet-RS-101 | 192x192 | 63.7 | 81.3 | 95.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) | | |
| ResNet-RS-152 | 192x192 | 86.8 | 81.9 | 95.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) | | |
| ResNet-RS-152 | 224x224 | 86.8 | 82.5 | 96.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) | | |
| ResNet-RS-152 | 256x256 | 86.8 | 83.1 | 96.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) | | |
| ResNet-RS-200 | 256x256 | 93.4 | 83.5 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) | | |
| ResNet-RS-270 | 256x256 | 130.1 | 83.6 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) | | |
| ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) | | |
| ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) | | |
#### Vision Transformer (ViT) | |
We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations. | |
ViT models trained under the DEIT settings: | |
model | resolution | Top-1 | Top-5 | Download | | |
--------- | :--------: | ----: | ----: | :-------: | | |
ViT-ti16 | 224x224 | 73.4 | 91.9 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/vit/vit-deit-imagenet-ti16.tar.gz) | | |
ViT-s16 | 224x224 | 79.4 | 94.7 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/vit/vit-deit-imagenet-s16.tar.gz) | | |
ViT-b16 | 224x224 | 81.8 | 95.8 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/vit/vit-deit-imagenet-b16.tar.gz) | | |
ViT-l16 | 224x224 | 82.2 | 95.8 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/vit/vit-deit-imagenet-l16.tar.gz) | | |
## Object Detection and Instance Segmentation | |
### Common Settings and Notes | |
* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and | |
[SpineNet](https://arxiv.org/abs/1912.05027) backbones based on detection frameworks: | |
* [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057) | |
* [Mask R-CNN](https://arxiv.org/abs/1703.06870) | |
* [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057) | |
* Models are all trained on [COCO](https://cocodataset.org/) train2017 and | |
evaluated on [COCO](https://cocodataset.org/) val2017. | |
* Training details: | |
* Models finetuned from [ImageNet](https://www.image-net.org/) pre-trained | |
checkpoints adopt the 12 or 36 epochs schedule. Models trained from scratch | |
adopt the 350 epochs schedule. | |
* The default training data augmentation implements horizontal flipping and | |
scale jittering with a random scale between [0.5, 2.0]. | |
* Unless noted, all models are trained with l2 weight regularization and ReLU | |
activation. | |
* We use batch size 256 and a stepwise learning rate that decays at the last 30 | |
and 10 epochs. | |
* We use a square image as input by resizing the long side of an image to the | |
target size and then padding the short side with zeros. | |
### COCO Object Detection Baselines | |
#### RetinaNet (ImageNet pretrained) | |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | | |
| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:| | |
| R50-FPN | 640x640 | 12 | 97.0 | 34.0 | 34.3 | config| | |
| R50-FPN | 640x640 | 72 | 97.0 | 34.0 | 36.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/retinanet.py#L187-L258) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) | | |
#### RetinaNet (Trained from scratch) with training features including: | |
* Stochastic depth with drop rate 0.2. | |
* Swish activation. | |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | | |
| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:| | |
| SpineNet-49 | 640x640 | 500 | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/spinenet/spinenet-49-i640.tar.gz) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| | |
| SpineNet-96 | 1024x1024 | 500 | 265.4 | 43.0 | 48.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/spinenet/spinenet-96-i1024.tar.gz) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| | |
| SpineNet-143 | 1280x1280 | 500 | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/spinenet/spinenet-143-i1280.tar.gz) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| | |
#### Mobile-size RetinaNet (Trained from scratch): | |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | | |
| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:| | |
| MobileNetv2 | 256x256 | 600 | - | 2.27 | 23.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) | | |
| Mobile SpineNet-49 | 384x384 | 600 | 1.0 | 2.32 | 28.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.googleapis.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) | | |
### Instance Segmentation Baselines | |
#### Mask R-CNN (Trained from scratch) | |
| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download | | |
| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:| | |
| ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) | | |
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) | | |
| SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) | | |
| SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) | | |
#### Cascade RCNN-RS (Trained from scratch) | |
| Backbone | Resolution | Epochs | Params (M) | Box AP | Mask AP | Download | |
------------ | :--------: | -----: | ---------: | -----: | ------: | -------: | |
| SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)| | |
| SpineNet-96 | 1024x1024 | 500 | 70.8 | 50.9 | 43.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml)| | |
| SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)| | |
## Semantic Segmentation | |
* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and | |
[DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with | |
Dilated ResNet backbones. | |
* Backbones are pre-trained on ImageNet. | |
### PASCAL-VOC | |
| Model | Backbone | Resolution | Steps | mIoU | Download | | |
| ---------- | :----------------: | :--------: | ----: | ---: | --------:| | |
| DeepLabV3 | Dilated Resnet-101 | 512x512 | 30k | 78.7 | | | |
| DeepLabV3+ | Dilated Resnet-101 | 512x512 | 30k | 79.2 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/deeplabv3plus/dilated-resnet-101-deeplabv3plus.tar.gz) | | |
### CITYSCAPES | |
| Model | Backbone | Resolution | Steps | mIoU | Download | | |
| ---------- | :----------------: | :--------: | ----: | ----: | --------:| | |
| DeepLabV3+ | Dilated Resnet-101 | 1024x2048 | 90k | 78.79 | | | |
## Video Classification | |
### Common Settings and Notes | |
* We provide models for video classification with backbones: | |
* SlowOnly in | |
[SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982). | |
* ResNet-3D (R3D) in | |
[Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). | |
* ResNet-3D-RS (R3D-RS) in | |
[Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf). | |
* Mobile Video Networks (MoViNets) in | |
[MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511). | |
* Training and evaluation details (SlowFast and ResNet): | |
* All models are trained from scratch with vision modality (RGB) for 200 | |
epochs. | |
* We use a batch size of 1024 and cosine learning rate decay with a linear warmup | |
in the first 5 epochs. | |
* We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform a 30-view | |
evaluation. | |
### Kinetics-400 Action Recognition Baselines | |
| Model | Input (frame x stride) | Top-1 | Top-5 | Download | | |
| -------- |:----------------------:|--------:|--------:|---------:| | |
| SlowOnly | 8 x 8 | 74.1 | 91.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) | | |
| SlowOnly | 16 x 4 | 75.6 | 92.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) | | |
| R3D-50 | 32 x 2 | 77.0 | 93.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) | | |
| R3D-RS-50 | 32 x 2 | 78.2 | 93.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) | | |
| R3D-RS-101 | 32 x 2 | 79.5 | 94.2 | - | |
| R3D-RS-152 | 32 x 2 | 79.9 | 94.3 | - | |
| R3D-RS-200 | 32 x 2 | 80.4 | 94.4 | - | |
| R3D-RS-200 | 48 x 2 | 81.0 | - | - | |
| MoViNet-A0-Base | 50 x 5 | 69.40 | 89.18 | - | |
| MoViNet-A1-Base | 50 x 5 | 74.57 | 92.03 | - | |
| MoViNet-A2-Base | 50 x 5 | 75.91 | 92.63 | - | |
| MoViNet-A3-Base | 120 x 2 | 79.34 | 94.52 | - | |
| MoViNet-A4-Base | 80 x 3 | 80.64 | 94.93 | - | |
| MoViNet-A5-Base | 120 x 2 | 81.39 | 95.06 | - | |
### Kinetics-600 Action Recognition Baselines | |
| Model | Input (frame x stride) | Top-1 | Top-5 | Download | | |
| -------- |:----------------------:|--------:|--------:|---------:| | |
| SlowOnly | 8 x 8 | 77.3 | 93.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) | | |
| R3D-50 | 32 x 2 | 79.5 | 94.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) | | |
| R3D-RS-200 | 32 x 2 | 83.1 | - | - | |
| R3D-RS-200 | 48 x 2 | 83.8 | - | - | |
| MoViNet-A0-Base | 50 x 5 | 72.05 | 90.92 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml) | | |
| MoViNet-A1-Base | 50 x 5 | 76.69 | 93.40 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml) | | |
| MoViNet-A2-Base | 50 x 5 | 78.62 | 94.17 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml) | | |
| MoViNet-A3-Base | 120 x 2 | 81.79 | 95.67 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml) | | |
| MoViNet-A4-Base | 80 x 3 | 83.48 | 96.16 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml) | | |
| MoViNet-A5-Base | 120 x 2 | 84.27 | 96.39 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml) | | |