ResNet v1
Use case : Image classification
Model description
ResNet models perform image classification - they take images as input and classify the major object in the image into a
set of pre-defined classes. ResNet models provide very high accuracies with affordable model sizes. They are ideal for cases when high accuracy of classification is required.
ResNet models consist of residual blocks and came up to counter the effect of deteriorating accuracies with more layers due to network not learning the initial layers.
ResNet v1 uses post-activation for the residual blocks. The models below have 8 and 32 layers with ResNet v1 architecture.
(source: https://keras.io/api/applications/resnet/)
The model is quantized in int8 using tensorflow lite converter.
Network information
The models are quantized using tensorflow lite converter.
Network inputs / outputs
For an image resolution of NxM and P classes
Input Shape |
Description |
(1, N, M, 3) |
Single NxM RGB image with UINT8 values between 0 and 255 |
Output Shape |
Description |
(1, P) |
Per-class confidence for P classes in FLOAT32 |
Recommended Platforms
Platform |
Supported |
Optimized |
STM32L0 |
[] |
[] |
STM32L4 |
[x] |
[] |
STM32U5 |
[x] |
[] |
STM32H7 |
[x] |
[x] |
STM32MP1 |
[x] |
[x]* |
STM32MP2 |
[x] |
[] |
STM32N6 |
[x] |
[] |
- Only for Cifar 100 models
Performances
Metrics
- Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
tfs
stands for "training from scratch", meaning that the model weights were randomly initialized before training.
tl
stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
fft
stands for "full fine-tuning", meaning that the full model weights were initialized from a transfer learning pre-trained model, and all the layers were unfrozen during the training.
Reference MCU memory footprint based on Cifar 10 dataset (see Accuracy for details on dataset)
Model |
Format |
Resolution |
Series |
Activation RAM |
Runtime RAM |
Weights Flash |
Code Flash |
Total RAM |
Total Flash |
STM32Cube.AI version |
ResNet v1 8 tfs |
Int8 |
32x32x3 |
STM32H7 |
62.51 KiB |
7.21 KiB |
76.9 KiB |
55.32 KiB |
69.72 KiB |
132.22 KiB |
10.2.0 |
Reference MCU inference time based on Cifar 10 dataset (see Accuracy for details on dataset)
Model |
Format |
Resolution |
Board |
Execution Engine |
Frequency |
Inference time (ms) |
STM32Cube.AI version |
ResNet v1 8 tfs |
Int8 |
32x32x3 |
STM32H747I-DISCO |
1 CPU |
400 MHz |
28.59 ms |
10.2.0 |
Reference MPU inference time based on Flowers dataset (see Accuracy for details on dataset)
Model |
Format |
Resolution |
Quantization |
Board |
Execution Engine |
Frequency |
Inference time (ms) |
%NPU |
%GPU |
%CPU |
X-LINUX-AI version |
Framework |
ResNet v1 8 tfs |
Int8 |
32x32x3 |
per-channel** |
STM32MP257F-DK2 |
NPU/GPU |
800 MHz |
2.09 ms |
15.63 |
84.37 |
0 |
v6.1.0 |
OpenVX |
ResNet v1 8 tfs |
Int8 |
32x32x3 |
per-channel |
STM32MP157F-DK2 |
2 CPU |
800 MHz |
6.49 ms |
NA |
NA |
100 |
v6.1.0 |
TensorFlowLite 2.18.0 |
ResNet v1 8 tfs |
Int8 |
32x32x3 |
per-channel |
STM32MP135F-DK2 |
1 CPU |
1000 MHz |
10.34 ms |
NA |
NA |
100 |
v6.1.0 |
TensorFlowLite 2.18.0 |
** To get the most out of MP25 NPU hardware acceleration, please use per-tensor quantization
Reference MCU memory footprint based on Cifar 100 dataset (see Accuracy for details on dataset)
Model |
Format |
Resolution |
Series |
Activation RAM |
Runtime RAM |
Weights Flash |
Code Flash |
Total RAM |
Total Flash |
ResNet v1 32 tfs |
Int8 |
32x32x3 |
STM32H7 |
45.41 KiB |
24.98 KiB |
464.38 KiB |
78.65 KiB |
70.39 KiB |
543.03 KiB |
Reference MCU inference time based on Cifar 100 dataset (see Accuracy for details on dataset)
Model |
Format |
Resolution |
Board |
Execution Engine |
Frequency |
Inference time (ms) |
ResNet v1 32 tfs |
Int8 |
32x32x3 |
STM32H747I-DISCO |
1 CPU |
400 MHz |
177.7 ms |
Reference MPU inference time based on Flowers dataset (see Accuracy for details on dataset)
Model |
Format |
Resolution |
Quantization |
Board |
Execution Engine |
Frequency |
Inference time (ms) |
%NPU |
%GPU |
%CPU |
X-LINUX-AI version |
Framework |
ResNet v1 32 tfs |
Int8 |
32x32x3 |
per-channel |
STM32MP257F-DK2 |
NPU/GPU |
800 MHz |
9.160 ms |
14.75 |
85.25 |
0 |
v6.1.0 |
OpenVX |
ResNet v1 32 tfs |
Int8 |
32x32x3 |
per-channel |
STM32MP157F-DK2 |
2 CPU |
800 MHz |
34.78 ms |
NA |
NA |
100 |
v6.1.0 |
TensorFlowLite 2.11.0 |
ResNet v1 32 tfs |
Int8 |
32x32x3 |
per-channel |
STM32MP135F-DK2 |
1 CPU |
1000 MHz |
55.32 ms |
NA |
NA |
100 |
v6.1.0 |
TensorFlowLite 2.11.0 |
Accuracy with Cifar10 dataset
Dataset details: link ,
License CC BY 4.0 , Quotation[1] , Number of classes: 10, Number of
images: 60 000
Accuracy with Cifar100 dataset
Dataset details: link ,
License CC0 4.0, Quotation[2] , Number of classes:100,
Number of images: 600 000
Retraining and Integration in a simple example:
Please refer to the stm32ai-modelzoo-services GitHub here
References
[1]
"Tf_flowers : tensorflow datasets," TensorFlow. [Online]. Available: https://www.tensorflow.org/datasets/catalog/tf_flowers.
[2]
J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1
[3]
L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.