|
 |
|
 |
|
|
|
# Tensorflow Object Detection API |
|
|
|
Creating accurate machine learning models capable of localizing and identifying |
|
multiple objects in a single image remains a core challenge in computer vision. |
|
The TensorFlow Object Detection API is an open source framework built on top of |
|
TensorFlow that makes it easy to construct, train and deploy object detection |
|
models. At Google we’ve certainly found this codebase to be useful for our |
|
computer vision needs, and we hope that you will as well. <p align="center"> |
|
<img src="g3doc/img/kites_detections_output.jpg" width=676 height=450> </p> |
|
Contributions to the codebase are welcome and we would love to hear back from |
|
you if you find this API useful. Finally if you use the Tensorflow Object |
|
Detection API for a research publication, please consider citing: |
|
|
|
``` |
|
"Speed/accuracy trade-offs for modern convolutional object detectors." |
|
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, |
|
Song Y, Guadarrama S, Murphy K, CVPR 2017 |
|
``` |
|
|
|
\[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\] |
|
|
|
<p align="center"> |
|
<img src="g3doc/img/tf-od-api-logo.png" width=140 height=195> |
|
</p> |
|
|
|
## Maintainers |
|
|
|
Name | GitHub |
|
-------------- | --------------------------------------------- |
|
Jonathan Huang | [jch1](https://github.com/jch1) |
|
Vivek Rathod | [tombstone](https://github.com/tombstone) |
|
Ronny Votel | [ronnyvotel](https://github.com/ronnyvotel) |
|
Derek Chow | [derekjchow](https://github.com/derekjchow) |
|
Chen Sun | [jesu9](https://github.com/jesu9) |
|
Menglong Zhu | [dreamdragon](https://github.com/dreamdragon) |
|
Alireza Fathi | [afathi3](https://github.com/afathi3) |
|
Zhichao Lu | [pkulzc](https://github.com/pkulzc) |
|
|
|
## Table of contents |
|
|
|
Setup: |
|
|
|
* <a href='g3doc/installation.md'>Installation</a><br> |
|
|
|
Quick Start: |
|
|
|
* <a href='object_detection_tutorial.ipynb'> |
|
Quick Start: Jupyter notebook for off-the-shelf inference</a><br> |
|
* <a href="g3doc/running_pets.md">Quick Start: Training a pet detector</a><br> |
|
|
|
Customizing a Pipeline: |
|
|
|
* <a href='g3doc/configuring_jobs.md'> |
|
Configuring an object detection pipeline</a><br> |
|
* <a href='g3doc/preparing_inputs.md'>Preparing inputs</a><br> |
|
|
|
Running: |
|
|
|
* <a href='g3doc/running_locally.md'>Running locally</a><br> |
|
* <a href='g3doc/running_on_cloud.md'>Running on the cloud</a><br> |
|
|
|
Extras: |
|
|
|
* <a href='g3doc/detection_model_zoo.md'>Tensorflow detection model zoo</a><br> |
|
* <a href='g3doc/exporting_models.md'> |
|
Exporting a trained model for inference</a><br> |
|
* <a href='g3doc/tpu_exporters.md'> |
|
Exporting a trained model for TPU inference</a><br> |
|
* <a href='g3doc/defining_your_own_model.md'> |
|
Defining your own model architecture</a><br> |
|
* <a href='g3doc/using_your_own_dataset.md'> |
|
Bringing in your own dataset</a><br> |
|
* <a href='g3doc/evaluation_protocols.md'> |
|
Supported object detection evaluation protocols</a><br> |
|
* <a href='g3doc/oid_inference_and_evaluation.md'> |
|
Inference and evaluation on the Open Images dataset</a><br> |
|
* <a href='g3doc/instance_segmentation.md'> |
|
Run an instance segmentation model</a><br> |
|
* <a href='g3doc/challenge_evaluation.md'> |
|
Run the evaluation for the Open Images Challenge 2018/2019</a><br> |
|
* <a href='g3doc/tpu_compatibility.md'> |
|
TPU compatible detection pipelines</a><br> |
|
* <a href='g3doc/running_on_mobile_tensorflowlite.md'> |
|
Running object detection on mobile devices with TensorFlow Lite</a><br> |
|
* <a href='g3doc/context_rcnn.md'> |
|
Context R-CNN documentation for data preparation, training, and export</a><br> |
|
|
|
## Getting Help |
|
|
|
To get help with issues you may encounter using the Tensorflow Object Detection |
|
API, create a new question on [StackOverflow](https://stackoverflow.com/) with |
|
the tags "tensorflow" and "object-detection". |
|
|
|
Please report bugs (actually broken code, not usage questions) to the |
|
tensorflow/models GitHub |
|
[issue tracker](https://github.com/tensorflow/models/issues), prefixing the |
|
issue name with "object_detection". |
|
|
|
Please check [FAQ](g3doc/faq.md) for frequently asked questions before reporting |
|
an issue. |
|
|
|
## Release information |
|
### June 17th, 2020 |
|
|
|
We have released [Context R-CNN](https://arxiv.org/abs/1912.03538), a model that |
|
uses attention to incorporate contextual information images (e.g. from |
|
temporally nearby frames taken by a static camera) in order to improve accuracy. |
|
Importantly, these contextual images need not be labeled. |
|
|
|
* When applied to a challenging wildlife detection dataset ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)), |
|
Context R-CNN with context from up to a month of images outperforms a |
|
single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution |
|
based baseline) by 11.2% mAP. |
|
* Context R-CNN leverages temporal context from the unlabeled frames of a |
|
novel camera deployment to improve performance at that camera, boosting |
|
model generalizeability. |
|
|
|
We have provided code for generating data with associated context |
|
[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN |
|
model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config). |
|
|
|
Snapshot Serengeti-trained Faster R-CNN and Context R-CNN models can be found in |
|
the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#snapshot-serengeti-camera-trap-trained-models). |
|
|
|
A colab demonstrating Context R-CNN is provided |
|
[here](colab_tutorials/context_rcnn_tutorial.ipynb). |
|
|
|
<b>Thanks to contributors</b>: Sara Beery, Jonathan Huang, Guanhang Wu, Vivek |
|
Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and |
|
the Wildlife Insights AI Team. |
|
|
|
### May 19th, 2020 |
|
|
|
We have released [MobileDets](https://arxiv.org/abs/2004.14525), a set of |
|
high-performance models for mobile CPUs, DSPs and EdgeTPUs. |
|
|
|
* MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile |
|
CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by |
|
1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while |
|
running equally fast. MobileDets also offer up to 2x speedup over MnasFPN on |
|
EdgeTPUs and DSPs. |
|
|
|
For each of the three hardware platforms we have released model definition, |
|
model checkpoints trained on the COCO14 dataset and converted TFLite models in |
|
fp32 and/or uint8. |
|
|
|
<b>Thanks to contributors</b>: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin |
|
Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen, |
|
Quoc Le, Zhichao Lu. |
|
|
|
### May 7th, 2020 |
|
|
|
We have released a mobile model with the |
|
[MnasFPN head](https://arxiv.org/abs/1912.01106). |
|
|
|
* MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms |
|
on Pixel 1) mobile detection model we have released to date. With |
|
depth-multiplier, MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than |
|
MobileNet-V3-Large with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency |
|
(120ms) on Pixel 1. |
|
|
|
We have released model definition, model checkpoints trained on the COCO14 |
|
dataset and a converted TFLite model. |
|
|
|
<b>Thanks to contributors</b>: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi |
|
Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu, Jonathan Huang, Hao |
|
Xu. |
|
|
|
### Nov 13th, 2019 |
|
|
|
We have released MobileNetEdgeTPU SSDLite model. |
|
|
|
* SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than |
|
MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable |
|
latency (6.6ms vs 6.8ms). |
|
|
|
Along with the model definition, we are also releasing model checkpoints trained |
|
on the COCO dataset. |
|
|
|
<b>Thanks to contributors</b>: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, |
|
Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le |
|
|
|
### Oct 15th, 2019 |
|
|
|
We have released two MobileNet V3 SSDLite models (presented in |
|
[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)). |
|
|
|
* SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet |
|
V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP. |
|
* SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet |
|
SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP. |
|
|
|
Along with the model definition, we are also releasing model checkpoints trained |
|
on the COCO dataset. |
|
|
|
<b>Thanks to contributors</b>: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang |
|
|
|
### July 1st, 2019 |
|
|
|
We have released an updated set of utils and an updated |
|
[tutorial](g3doc/challenge_evaluation.md) for all three tracks of the |
|
[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)! |
|
|
|
The Instance Segmentation metric for |
|
[Open Images V5](https://storage.googleapis.com/openimages/web/index.html) and |
|
[Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html) |
|
is part of this release. Check out |
|
[the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval) |
|
on the Open Images website. |
|
|
|
<b>Thanks to contributors</b>: Alina Kuznetsova, Rodrigo Benenson |
|
|
|
### Feb 11, 2019 |
|
|
|
We have released detection models trained on the Open Images Dataset V4 in our |
|
detection model zoo, including |
|
|
|
* Faster R-CNN detector with Inception Resnet V2 feature extractor |
|
* SSD detector with MobileNet V2 feature extractor |
|
* SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101) |
|
|
|
<b>Thanks to contributors</b>: Alina Kuznetsova, Yinxiao Li |
|
|
|
### Sep 17, 2018 |
|
|
|
We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature |
|
extractors trained on the |
|
[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes). |
|
The models are trained on the training split of the iNaturalist data for 4M |
|
iterations, they achieve 55% and 58% mean [email protected] over 2854 classes respectively. |
|
For more details please refer to this [paper](https://arxiv.org/abs/1707.06642). |
|
|
|
<b>Thanks to contributors</b>: Chen Sun |
|
|
|
### July 13, 2018 |
|
|
|
There are many new updates in this release, extending the functionality and |
|
capability of the API: |
|
|
|
* Moving from slim-based training to |
|
[Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based |
|
training. |
|
* Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a |
|
[MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) |
|
adaptation of RetinaNet. |
|
* A novel SSD-based architecture called the |
|
[Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN). |
|
* Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models. |
|
These can be found in the `samples/configs/` directory with a comment in the |
|
pipeline configuration files indicating TPU compatibility. |
|
* Support for quantized training. |
|
* Updated documentation for new binaries, Cloud training, and |
|
[Tensorflow Lite](https://www.tensorflow.org/mobile/tflite/). |
|
|
|
See also our |
|
[expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) |
|
and accompanying tutorial at the |
|
[TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193). |
|
|
|
<b>Thanks to contributors</b>: Sara Robinson, Aakanksha Chowdhery, Derek Chow, |
|
Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel |
|
|
|
### June 25, 2018 |
|
|
|
Additional evaluation tools for the |
|
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) |
|
are out. Check out our short tutorial on data preparation and running evaluation |
|
[here](g3doc/challenge_evaluation.md)! |
|
|
|
<b>Thanks to contributors</b>: Alina Kuznetsova |
|
|
|
### June 5, 2018 |
|
|
|
We have released the implementation of evaluation metrics for both tracks of the |
|
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) |
|
as a part of the Object Detection API - see the |
|
[evaluation protocols](g3doc/evaluation_protocols.md) for more details. |
|
Additionally, we have released a tool for hierarchical labels expansion for the |
|
Open Images Challenge: check out |
|
[oid_hierarchical_labels_expansion.py](dataset_tools/oid_hierarchical_labels_expansion.py). |
|
|
|
<b>Thanks to contributors</b>: Alina Kuznetsova, Vittorio Ferrari, Jasper |
|
Uijlings |
|
|
|
### April 30, 2018 |
|
|
|
We have released a Faster R-CNN detector with ResNet-101 feature extractor |
|
trained on [AVA](https://research.google.com/ava/) v2.1. Compared with other |
|
commonly used object detectors, it changes the action classification loss |
|
function to per-class Sigmoid loss to handle boxes with multiple labels. The |
|
model is trained on the training split of AVA v2.1 for 1.5M iterations, it |
|
achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1. |
|
For more details please refer to this [paper](https://arxiv.org/abs/1705.08421). |
|
|
|
<b>Thanks to contributors</b>: Chen Sun, David Ross |
|
|
|
### April 2, 2018 |
|
|
|
Supercharge your mobile phones with the next generation mobile object detector! |
|
We are adding support for MobileNet V2 with SSDLite presented in |
|
[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). |
|
This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU |
|
(200ms vs. 270ms) at the same accuracy. Along with the model definition, we are |
|
also releasing a model checkpoint trained on the COCO dataset. |
|
|
|
<b>Thanks to contributors</b>: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek |
|
Rathod, Jonathan Huang |
|
|
|
### February 9, 2018 |
|
|
|
We now support instance segmentation!! In this API update we support a number of |
|
instance segmentation models similar to those discussed in the |
|
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer |
|
to [our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the |
|
2017 Coco + Places Workshop. Refer to the section on |
|
[Running an Instance Segmentation Model](g3doc/instance_segmentation.md) for |
|
instructions on how to configure a model that predicts masks in addition to |
|
object bounding boxes. |
|
|
|
<b>Thanks to contributors</b>: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny |
|
Votel, Jonathan Huang |
|
|
|
### November 17, 2017 |
|
|
|
As a part of the Open Images V3 release we have released: |
|
|
|
* An implementation of the Open Images evaluation metric and the |
|
[protocol](g3doc/evaluation_protocols.md#open-images). |
|
* Additional tools to separate inference of detection and evaluation (see |
|
[this tutorial](g3doc/oid_inference_and_evaluation.md)). |
|
* A new detection model trained on the Open Images V2 data release (see |
|
[Open Images model](g3doc/detection_model_zoo.md#open-images-models)). |
|
|
|
See more information on the |
|
[Open Images website](https://github.com/openimages/dataset)! |
|
|
|
<b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova |
|
|
|
### November 6, 2017 |
|
|
|
We have re-released faster versions of our (pre-trained) models in the |
|
<a href='g3doc/detection_model_zoo.md'>model zoo</a>. In addition to what was |
|
available before, we are also adding Faster R-CNN models trained on COCO with |
|
Inception V2 and Resnet-50 feature extractors, as well as a Faster R-CNN with |
|
Resnet-101 model trained on the KITTI dataset. |
|
|
|
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Tal |
|
Remez, Chen Sun. |
|
|
|
### October 31, 2017 |
|
|
|
We have released a new state-of-the-art model for object detection using the |
|
Faster-RCNN with the |
|
[NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This model |
|
achieves mAP of 43.1% on the test-dev validation dataset for COCO, improving on |
|
the best available model in the zoo by 6% in terms of absolute mAP. |
|
|
|
<b>Thanks to contributors</b>: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, |
|
Quoc Le |
|
|
|
### August 11, 2017 |
|
|
|
We have released an update to the |
|
[Android Detect demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android) |
|
which will now run models trained using the Tensorflow Object Detection API on |
|
an Android device. By default, it currently runs a frozen SSD w/Mobilenet |
|
detector trained on COCO, but we encourage you to try out other detection |
|
models! |
|
|
|
<b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp |
|
|
|
### June 15, 2017 |
|
|
|
In addition to our base Tensorflow detection model definitions, this release |
|
includes: |
|
|
|
* A selection of trainable detection models, including: |
|
* Single Shot Multibox Detector (SSD) with MobileNet, |
|
* SSD with Inception V2, |
|
* Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101, |
|
* Faster RCNN with Resnet 101, |
|
* Faster RCNN with Inception Resnet v2 |
|
* Frozen weights (trained on the COCO dataset) for each of the above models to |
|
be used for out-of-the-box inference purposes. |
|
* A [Jupyter notebook](colab_tutorials/object_detection_tutorial.ipynb) for |
|
performing out-of-the-box inference with one of our released models |
|
* Convenient [local training](g3doc/running_locally.md) scripts as well as |
|
distributed training and evaluation pipelines via |
|
[Google Cloud](g3doc/running_on_cloud.md). |
|
|
|
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Chen |
|
Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, |
|
Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav |
|
Kovalevskyi, Kevin Murphy |
|
|