File size: 9,125 Bytes
61c2d32 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# Use Models
## Build Models from Yacs Config
From a yacs config object,
models (and their sub-models) can be built by
functions such as `build_model`, `build_backbone`, `build_roi_heads`:
from detectron2.modeling import build_model
model = build_model(cfg) # returns a torch.nn.Module
`build_model` only builds the model structure and fills it with random parameters.
See below for how to load an existing checkpoint to the model and how to use the `model` object.
### Load/Save a Checkpoint
from detectron2.checkpoint import DetectionCheckpointer
DetectionCheckpointer(model).load(file_path_or_url) # load a file, usually from cfg.MODEL.WEIGHTS
checkpointer = DetectionCheckpointer(model, save_dir="output")"model_999") # save to output/model_999.pth
Detectron2's checkpointer recognizes models in pytorch's `.pth` format, as well as the `.pkl` files
in our model zoo.
See [API doc](../modules/checkpoint.html#detectron2.checkpoint.DetectionCheckpointer)
for more details about its usage.
The model files can be arbitrarily manipulated using `torch.{load,save}` for `.pth` files or
`pickle.{dump,load}` for `.pkl` files.
### Use a Model
A model can be called by `outputs = model(inputs)`, where `inputs` is a `list[dict]`.
Each dict corresponds to one image and the required keys
depend on the type of model, and whether the model is in training or evaluation mode.
For example, in order to do inference,
all existing models expect the "image" key, and optionally "height" and "width".
The detailed format of inputs and outputs of existing models are explained below.
__Training__: When in training mode, all models are required to be used under an `EventStorage`.
The training statistics will be put into the storage:
from import EventStorage
with EventStorage() as storage:
losses = model(inputs)
__Inference__: If you only want to do simple inference using an existing model,
is a wrapper around model that provides such basic functionality.
It includes default behavior including model loading, preprocessing,
and operates on single image rather than batches. See its documentation for usage.
You can also run inference directly like this:
with torch.no_grad():
outputs = model(inputs)
### Model Input Format
Users can implement custom models that support any arbitrary input format.
Here we describe the standard input format that all builtin models support in detectron2.
They all take a `list[dict]` as the inputs. Each dict
corresponds to information about one image.
The dict may contain the following keys:
* "image": `Tensor` in (C, H, W) format. The meaning of channels are defined by `cfg.INPUT.FORMAT`.
Image normalization, if any, will be performed inside the model using
* "height", "width": the **desired** output height and width **in inference**, which is not necessarily the same
as the height or width of the `image` field.
For example, the `image` field contains the resized image, if resize is used as a preprocessing step.
But you may want the outputs to be in **original** resolution.
If provided, the model will produce output in this resolution,
rather than in the resolution of the `image` as input into the model. This is more efficient and accurate.
* "instances": an [Instances](../modules/structures.html#detectron2.structures.Instances)
object for training, with the following fields:
+ "gt_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each instance.
+ "gt_classes": `Tensor` of long type, a vector of N labels, in range [0, num_categories).
+ "gt_masks": a [PolygonMasks](../modules/structures.html#detectron2.structures.PolygonMasks)
or [BitMasks](../modules/structures.html#detectron2.structures.BitMasks) object storing N masks, one for each instance.
+ "gt_keypoints": a [Keypoints](../modules/structures.html#detectron2.structures.Keypoints)
object storing N keypoint sets, one for each instance.
* "sem_seg": `Tensor[int]` in (H, W) format. The semantic segmentation ground truth for training.
Values represent category labels starting from 0.
* "proposals": an [Instances](../modules/structures.html#detectron2.structures.Instances)
object used only in Fast R-CNN style models, with the following fields:
+ "proposal_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing P proposal boxes.
+ "objectness_logits": `Tensor`, a vector of P scores, one for each proposal.
For inference of builtin models, only "image" key is required, and "width/height" are optional.
We currently don't define standard input format for panoptic segmentation training,
because models now use custom formats produced by custom data loaders.
#### How it connects to data loader:
The output of the default [DatasetMapper]( ../modules/ is a dict
that follows the above format.
After the data loader performs batching, it becomes `list[dict]` which the builtin models support.
### Model Output Format
When in training mode, the builtin models output a `dict[str->ScalarTensor]` with all the losses.
When in inference mode, the builtin models output a `list[dict]`, one dict for each image.
Based on the tasks the model is doing, each dict may contain the following fields:
* "instances": [Instances](../modules/structures.html#detectron2.structures.Instances)
object with the following fields:
* "pred_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each detected instance.
* "scores": `Tensor`, a vector of N confidence scores.
* "pred_classes": `Tensor`, a vector of N labels in range [0, num_categories).
+ "pred_masks": a `Tensor` of shape (N, H, W), masks for each detected instance.
+ "pred_keypoints": a `Tensor` of shape (N, num_keypoint, 3).
Each row in the last dimension is (x, y, score). Confidence scores are larger than 0.
* "sem_seg": `Tensor` of (num_categories, H, W), the semantic segmentation prediction.
* "proposals": [Instances](../modules/structures.html#detectron2.structures.Instances)
object with the following fields:
* "proposal_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes)
object storing N boxes.
* "objectness_logits": a torch vector of N confidence scores.
* "panoptic_seg": A tuple of `(pred: Tensor, segments_info: Optional[list[dict]])`.
The `pred` tensor has shape (H, W), containing the segment id of each pixel.
* If `segments_info` exists, each dict describes one segment id in `pred` and has the following fields:
* "id": the segment id
* "isthing": whether the segment is a thing or stuff
* "category_id": the category id of this segment.
If a pixel's id does not exist in `segments_info`, it is considered to be void label
defined in [Panoptic Segmentation](
* If `segments_info` is None, all pixel values in `pred` must be β₯ -1.
Pixels with value -1 are assigned void labels.
Otherwise, the category id of each pixel is obtained by
`category_id = pixel // metadata.label_divisor`.
### Partially execute a model:
Sometimes you may want to obtain an intermediate tensor inside a model,
such as the input of certain layer, the output before post-processing.
Since there are typically hundreds of intermediate tensors, there isn't an API that provides you
the intermediate result you need.
You have the following options:
1. Write a (sub)model. Following the [tutorial](./, you can
rewrite a model component (e.g. a head of a model), such that it
does the same thing as the existing component, but returns the output
you need.
2. Partially execute a model. You can create the model as usual,
but use custom code to execute it instead of its `forward()`. For example,
the following code obtains mask features before mask head.
images = ImageList.from_tensors(...) # preprocessed input tensor
model = build_model(cfg)
features = model.backbone(images.tensor)
proposals, _ = model.proposal_generator(images, features)
instances, _ = model.roi_heads(images, features, proposals)
mask_features = [features[f] for f in model.roi_heads.in_features]
mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
3. Use [forward hooks](
Forward hooks can help you obtain inputs or outputs of a certain module.
If they are not exactly what you want, they can at least be used together with partial execution
to obtain other tensors.
All options require you to read documentation and sometimes code
of the existing models to understand the internal logic,
in order to write code to obtain the internal tensors.