File size: 7,834 Bytes
97b6013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# Context R-CNN

Context R-CNN is an object detection model that uses contextual features to
improve object detection. See https://arxiv.org/abs/1912.03538 for more details.

## Table of Contents

*   [Preparing Context Data for Context R-CNN](#preparing-context-data-for-context-r-cnn)
    +   [Generating TfRecords from a set of images and a COCO-CameraTraps style
        JSON](#generating-tfrecords-from-a-set-of-images-and-a-coco-cameratraps-style-json)
    +   [Generating weakly-supervised bounding box labels for image-labeled data](#generating-weakly-supervised-bounding-box-labels-for-image-labeled-data)
    +   [Generating and saving contextual features for each image](#generating-and-saving-contextual-features-for-each-image)
    +   [Building up contextual memory banks and storing them for each context
        group](#building-up-contextual-memory-banks-and-storing-them-for-each-context-group)
-   [Training a Context R-CNN Model](#training-a-context-r-cnn-model)
-   [Exporting a Context R-CNN Model](#exporting-a-context-r-cnn-model)

## Preparing Context Data for Context R-CNN

In this section, we will walk through the process of generating TfRecords with
contextual features. We focus on building context from object-centric features
generated with a pre-trained Faster R-CNN model, but you can adapt the provided
code to use alternative feature extractors.

Each of these data processing scripts uses Apache Beam, which can be installed
using

```
pip install apache-beam
```

and can be run locally, or on a cluster for efficient processing of large
amounts of data. See the
[Apache Beam documentation](https://beam.apache.org/documentation/runners/dataflow/)
for more information.

### Generating TfRecords from a set of images and a COCO-CameraTraps style JSON

If your data is already stored in TfRecords, you can skip this first step.

We assume a COCO-CameraTraps json format, as described on
[LILA.science](https://github.com/microsoft/CameraTraps/blob/master/data_management/README.md).

COCO-CameraTraps is a format that adds static-camera-specific fields, such as a
location ID and datetime, to the well-established COCO format. To generate
appropriate context later on, be sure you have specified each contextual group
with a different location ID, which in the static camera case would be the ID of
the camera, as well as the datetime each photo was taken. We assume that empty
images will be labeled 'empty' with class id 0.

To generate TfRecords from your database and local image folder, run

```
python object_detection/dataset_tools/context_rcnn/create_cococameratraps_tfexample_main.py \
  --alsologtostderr \
  --output_tfrecord_prefix="/path/to/output/tfrecord/location/prefix" \
  --image_directory="/path/to/image/folder/" \
  --input_annotations_file="path/to/annotations.json"
```

### Generating weakly-supervised bounding box labels for image-labeled data

If all your data already has bounding box labels you can skip this step.

Many camera trap datasets do not have bounding box labels, or only have bounding
box labels for some of the data. We have provided code to add bounding boxes
from a pretrained model (such as the
[Microsoft AI for Earth MegaDetector](https://github.com/microsoft/CameraTraps/blob/master/megadetector.md))
and match the boxes to the image-level class label.

To export your pretrained detection model, run

```
python object_detection/export_inference_graph.py \
  --alsologtostderr \
  --input_type tf_example \
  --pipeline_config_path path/to/faster_rcnn_model.config \
  --trained_checkpoint_prefix path/to/model.ckpt \
  --output_directory path/to/exported_model_directory
```

To add bounding boxes to your dataset using the above model, run

```
python object_detection/dataset_tools/context_rcnn/generate_detection_data.py \
  --alsologtostderr \
  --input_tfrecord path/to/input_tfrecord@X \
  --output_tfrecord path/to/output_tfrecord@X \
  --model_dir path/to/exported_model_directory/saved_model
```

If an image already has bounding box labels, those labels are left unchanged. If
an image is labeled 'empty' (class ID 0), we will not generate boxes for that
image.

### Generating and saving contextual features for each image

We next extract and store features for each image from a pretrained model. This
model can be the same model as above, or be a class-specific detection model
trained on data from your classes of interest.

To export your pretrained detection model, run

```
python object_detection/export_inference_graph.py \
  --alsologtostderr \
  --input_type tf_example \
  --pipeline_config_path path/to/pipeline.config \
  --trained_checkpoint_prefix path/to/model.ckpt \
  --output_directory path/to/exported_model_directory \
  --additional_output_tensor_names detection_features
```

Make sure that you have set `output_final_box_features: true` within
your config file before exporting. This is needed to export the features as an
output, but it does not need to be set during training.

To generate and save contextual features for your data, run

```
python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \
  --alsologtostderr \
  --embedding_input_tfrecord path/to/input_tfrecords* \
  --embedding_output_tfrecord path/to/output_tfrecords \
  --embedding_model_dir path/to/exported_model_directory/saved_model
```

### Building up contextual memory banks and storing them for each context group

To build the context features you just added for each image into memory banks,
run

```
python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
  --input_tfrecord path/to/input_tfrecords* \
  --output_tfrecord path/to/output_tfrecords \
  --sequence_key image/location \
  --time_horizon month
```

where the input_tfrecords for add_context_to_examples.py are the
output_tfrecords from generate_embedding_data.py.

For all options, see add_context_to_examples.py. By default, this code builds
TfSequenceExamples, which are more data efficient (this allows you to store the
context features once for each context group, as opposed to once per image). If
you would like to export TfExamples instead, set flag `--output_type
tf_example`.

If you use TfSequenceExamples, you must be sure to set `input_type:
TF_SEQUENCE_EXAMPLE` within your Context R-CNN configs for both
train_input_reader and test_input_reader. See
`object_detection/test_data/context_rcnn_camera_trap.config`
for an example.

## Training a Context R-CNN Model

To train a Context R-CNN model, you must first set up your config file. See
`test_data/context_rcnn_camera_trap.config` for an example. The important
difference between this config and a Faster R-CNN config is the inclusion of a
`context_config` within the model, which defines the necessary Context R-CNN
parameters.

```
context_config {
      max_num_context_features: 2000
      context_feature_length: 2057
    }
```

Once your config file has been updated with your local paths, you can follow
along with documentation for running [locally](running_locally.md), or
[on the cloud](running_on_cloud.md).

## Exporting a Context R-CNN Model

Since Context R-CNN takes context features as well as images as input, we have
to explicitly define the other inputs ("side_inputs") to the model when
exporting, as below. This example is shown with default context feature shapes.

```
python export_inference_graph.py \
    --input_type image_tensor \
    --input_shape 1,-1,-1,3 \
    --pipeline_config_path /path/to/context_rcnn_model/pipeline.config \
    --trained_checkpoint_prefix /path/to/context_rcnn_model/model.ckpt \
    --output_directory /path/to/output_directory \
    --use_side_inputs True \
    --side_input_shapes 1,2000,2057/1 \
    --side_input_names context_features,valid_context_size \
    --side_input_types float,int

```