Spaces:
Building
Building
# Quantize DeepLab model for faster on-device inference | |
This page describes the steps required to quantize DeepLab model and convert it | |
to TFLite for on-device inference. The main steps include: | |
1. Quantization-aware training | |
1. Exporting model | |
1. Converting to TFLite FlatBuffer | |
We provide details for each step below. | |
## Quantization-aware training | |
DeepLab supports two approaches to quantize your model. | |
1. **[Recommended]** Training a non-quantized model until convergence. Then | |
fine-tune the trained float model with quantization using a small learning | |
rate (on PASCAL we use the value of 3e-5) . This fine-tuning step usually | |
takes 2k to 5k steps to converge. | |
1. Training a deeplab float model with delayed quantization. Usually we delay | |
quantization until the last a few thousand steps in training. | |
In the current implementation, quantization is only supported with 1) | |
`num_clones=1` for training and 2) single scale inference for evaluation, | |
visualization and model export. To get the best performance for the quantized | |
model, we strongly recommend to train the float model with larger `num_clones` | |
and then fine-tune the model with a single clone. | |
Here shows the commandline to quantize deeplab model trained on PASCAL VOC | |
dataset using fine-tuning: | |
``` | |
# From tensorflow/models/research/ | |
python deeplab/train.py \ | |
--logtostderr \ | |
--training_number_of_steps=3000 \ | |
--train_split="train" \ | |
--model_variant="mobilenet_v2" \ | |
--output_stride=16 \ | |
--train_crop_size="513,513" \ | |
--train_batch_size=8 \ | |
--base_learning_rate=3e-5 \ | |
--dataset="pascal_voc_seg" \ | |
--initialize_last_layer \ | |
--quantize_delay_step=0 \ | |
--tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \ | |
--train_logdir=${PATH_TO_TRAIN_DIR} \ | |
--dataset_dir=${PATH_TO_DATASET} | |
``` | |
## Converting to TFLite FlatBuffer | |
First use the following commandline to export your trained model. | |
``` | |
# From tensorflow/models/research/ | |
python deeplab/export_model.py \ | |
--checkpoint_path=${CHECKPOINT_PATH} \ | |
--quantize_delay_step=0 \ | |
--export_path=${OUTPUT_DIR}/frozen_inference_graph.pb | |
``` | |
Commandline below shows how to convert exported graphdef to TFlite model. | |
``` | |
tflite_convert \ | |
--graph_def_file=${OUTPUT_DIR}/frozen_inference_graph.pb \ | |
--output_file=${OUTPUT_DIR}/frozen_inference_graph.tflite \ | |
--output_format=TFLITE \ | |
--input_shape=1,513,513,3 \ | |
--input_arrays="MobilenetV2/MobilenetV2/input" \ | |
--inference_type=QUANTIZED_UINT8 \ | |
--inference_input_type=QUANTIZED_UINT8 \ | |
--std_dev_values=128 \ | |
--mean_values=128 \ | |
--change_concat_input_ranges=true \ | |
--output_arrays="ArgMax" | |
``` | |
**[Important]** Note that converted model expects 513x513 RGB input and doesn't | |
include preprocessing (resize and pad input image) and post processing (crop | |
padded region and resize to original input size). These steps can be implemented | |
outside of TFlite model. | |
## Quantized model on PASCAL VOC | |
We provide float and quantized checkpoints that have been pretrained on VOC 2012 | |
train_aug set, using MobileNet-v2 backbone with different depth multipliers. | |
Quantized model usually have 1% decay in mIoU. | |
For quantized (8bit) model, un-tar'ed directory includes: | |
* a frozen inference graph (frozen_inference_graph.pb) | |
* a checkpoint (model.ckpt.data*, model.ckpt.index) | |
* a converted TFlite FlatBuffer file (frozen_inference_graph.tflite) | |
Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Quantize | PASCAL mIOU | Folder Size | TFLite File Size | |
-------------------------------------------------------------------------------------------------------------------------------------------- | :-----: | :---------: | :-------------: | :-----------: | :------: | :----------: | :-------: | :-------: | |
[mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) | 16 | [1.0] | No | 0.88B | No | 70.19% (val) | 7.6MB | N/A | |
[mobilenetv2_dm05_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 0.88B | Yes | 69.65% (val) | 8.2MB | 751.1KB | |
[mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) | 16 | [1.0] | No | 2.75B | No | 75.32% (val) | 23MB | N/A | |
[mobilenetv2_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 2.75B | Yes | 74.26% (val) | 24MB | 2.2MB | |
Note that you might need the nightly build of TensorFlow (see | |
[here](https://www.tensorflow.org/install) for install instructions) to convert | |
above quantized model to TFLite. | |