File size: 7,834 Bytes
97b6013 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# Context R-CNN
Context R-CNN is an object detection model that uses contextual features to
improve object detection. See https://arxiv.org/abs/1912.03538 for more details.
## Table of Contents
* [Preparing Context Data for Context R-CNN](#preparing-context-data-for-context-r-cnn)
+ [Generating TfRecords from a set of images and a COCO-CameraTraps style
JSON](#generating-tfrecords-from-a-set-of-images-and-a-coco-cameratraps-style-json)
+ [Generating weakly-supervised bounding box labels for image-labeled data](#generating-weakly-supervised-bounding-box-labels-for-image-labeled-data)
+ [Generating and saving contextual features for each image](#generating-and-saving-contextual-features-for-each-image)
+ [Building up contextual memory banks and storing them for each context
group](#building-up-contextual-memory-banks-and-storing-them-for-each-context-group)
- [Training a Context R-CNN Model](#training-a-context-r-cnn-model)
- [Exporting a Context R-CNN Model](#exporting-a-context-r-cnn-model)
## Preparing Context Data for Context R-CNN
In this section, we will walk through the process of generating TfRecords with
contextual features. We focus on building context from object-centric features
generated with a pre-trained Faster R-CNN model, but you can adapt the provided
code to use alternative feature extractors.
Each of these data processing scripts uses Apache Beam, which can be installed
using
```
pip install apache-beam
```
and can be run locally, or on a cluster for efficient processing of large
amounts of data. See the
[Apache Beam documentation](https://beam.apache.org/documentation/runners/dataflow/)
for more information.
### Generating TfRecords from a set of images and a COCO-CameraTraps style JSON
If your data is already stored in TfRecords, you can skip this first step.
We assume a COCO-CameraTraps json format, as described on
[LILA.science](https://github.com/microsoft/CameraTraps/blob/master/data_management/README.md).
COCO-CameraTraps is a format that adds static-camera-specific fields, such as a
location ID and datetime, to the well-established COCO format. To generate
appropriate context later on, be sure you have specified each contextual group
with a different location ID, which in the static camera case would be the ID of
the camera, as well as the datetime each photo was taken. We assume that empty
images will be labeled 'empty' with class id 0.
To generate TfRecords from your database and local image folder, run
```
python object_detection/dataset_tools/context_rcnn/create_cococameratraps_tfexample_main.py \
--alsologtostderr \
--output_tfrecord_prefix="/path/to/output/tfrecord/location/prefix" \
--image_directory="/path/to/image/folder/" \
--input_annotations_file="path/to/annotations.json"
```
### Generating weakly-supervised bounding box labels for image-labeled data
If all your data already has bounding box labels you can skip this step.
Many camera trap datasets do not have bounding box labels, or only have bounding
box labels for some of the data. We have provided code to add bounding boxes
from a pretrained model (such as the
[Microsoft AI for Earth MegaDetector](https://github.com/microsoft/CameraTraps/blob/master/megadetector.md))
and match the boxes to the image-level class label.
To export your pretrained detection model, run
```
python object_detection/export_inference_graph.py \
--alsologtostderr \
--input_type tf_example \
--pipeline_config_path path/to/faster_rcnn_model.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
```
To add bounding boxes to your dataset using the above model, run
```
python object_detection/dataset_tools/context_rcnn/generate_detection_data.py \
--alsologtostderr \
--input_tfrecord path/to/input_tfrecord@X \
--output_tfrecord path/to/output_tfrecord@X \
--model_dir path/to/exported_model_directory/saved_model
```
If an image already has bounding box labels, those labels are left unchanged. If
an image is labeled 'empty' (class ID 0), we will not generate boxes for that
image.
### Generating and saving contextual features for each image
We next extract and store features for each image from a pretrained model. This
model can be the same model as above, or be a class-specific detection model
trained on data from your classes of interest.
To export your pretrained detection model, run
```
python object_detection/export_inference_graph.py \
--alsologtostderr \
--input_type tf_example \
--pipeline_config_path path/to/pipeline.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory \
--additional_output_tensor_names detection_features
```
Make sure that you have set `output_final_box_features: true` within
your config file before exporting. This is needed to export the features as an
output, but it does not need to be set during training.
To generate and save contextual features for your data, run
```
python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \
--alsologtostderr \
--embedding_input_tfrecord path/to/input_tfrecords* \
--embedding_output_tfrecord path/to/output_tfrecords \
--embedding_model_dir path/to/exported_model_directory/saved_model
```
### Building up contextual memory banks and storing them for each context group
To build the context features you just added for each image into memory banks,
run
```
python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
--input_tfrecord path/to/input_tfrecords* \
--output_tfrecord path/to/output_tfrecords \
--sequence_key image/location \
--time_horizon month
```
where the input_tfrecords for add_context_to_examples.py are the
output_tfrecords from generate_embedding_data.py.
For all options, see add_context_to_examples.py. By default, this code builds
TfSequenceExamples, which are more data efficient (this allows you to store the
context features once for each context group, as opposed to once per image). If
you would like to export TfExamples instead, set flag `--output_type
tf_example`.
If you use TfSequenceExamples, you must be sure to set `input_type:
TF_SEQUENCE_EXAMPLE` within your Context R-CNN configs for both
train_input_reader and test_input_reader. See
`object_detection/test_data/context_rcnn_camera_trap.config`
for an example.
## Training a Context R-CNN Model
To train a Context R-CNN model, you must first set up your config file. See
`test_data/context_rcnn_camera_trap.config` for an example. The important
difference between this config and a Faster R-CNN config is the inclusion of a
`context_config` within the model, which defines the necessary Context R-CNN
parameters.
```
context_config {
max_num_context_features: 2000
context_feature_length: 2057
}
```
Once your config file has been updated with your local paths, you can follow
along with documentation for running [locally](running_locally.md), or
[on the cloud](running_on_cloud.md).
## Exporting a Context R-CNN Model
Since Context R-CNN takes context features as well as images as input, we have
to explicitly define the other inputs ("side_inputs") to the model when
exporting, as below. This example is shown with default context feature shapes.
```
python export_inference_graph.py \
--input_type image_tensor \
--input_shape 1,-1,-1,3 \
--pipeline_config_path /path/to/context_rcnn_model/pipeline.config \
--trained_checkpoint_prefix /path/to/context_rcnn_model/model.ckpt \
--output_directory /path/to/output_directory \
--use_side_inputs True \
--side_input_shapes 1,2000,2057/1 \
--side_input_names context_features,valid_context_size \
--side_input_types float,int
```
|