EfficientNet v2

Use case : `Image classification`

Model description

EfficientNet v2 family is one of the best topologies for image classification. It has been obtained through neural architecture search with a special care given to training time and number of parameters reduction.

This family of networks comprises various subtypes: B0 (224x224), B1 (240x240), B2 (260x260), B3 (300x300), S (384x384) ranked by depth and width increasing order. There are also M, L, XL variants but too large to be executed efficiently on STM32N6.

All these networks are already available on https://www.tensorflow.org/api_docs/python/tf/keras/applications/ pre-trained on imagenet.

Network information

Network Information	Value
Framework	TensorFlow Lite/ONNX quantizer
MParams type=B0	7.1 M
Quantization	int8
Provenance	https://www.tensorflow.org/api_docs/python/tf/keras/applications/efficientnet_v2
Paper	https://arxiv.org/pdf/2104.00298

The models are quantized using tensorflow lite converter or ONNX quantizer.

Network inputs / outputs

For an image resolution of NxM and P classes

Input Shape	Description
(1, N, M, 3)	Single NxM RGB image with UINT8 values between 0 and 255 for tflite
(1, 3, N, M)	Single NxM RGB image with INT8 values between -128 and 127 for ONNX

Output Shape	Description
(1, P)	Per-class confidence for P classes in FLOAT32 for tflite
(1, P)	Per-class confidence for P classes in FLOAT32 for ONNX

Recommended platforms

Platform	Supported	Recommended
STM32L0	[]	[]
STM32L4	[]	[]
STM32U5	[]	[]
STM32H7	[]	[]
STM32MP1	[x]	[x]
STM32MP2	[x]	[x]
STM32N6	[x]	[x]

Performances

Metrics

Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
fft stands for "full fine-tuning", meaning that the full model weights were initialized from a transfer learning pre-trained model, and all the layers were unfrozen during the training.

Reference NPU memory footprint on food101 and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Series	Internal RAM (KiB)	External RAM (KiB)	Weights Flash (KiB)	STEdgeAI Core version
efficientnetv2b0_224_fft onnx	food101	Int8	224x224x3	STM32N6	1911.56	0.0	6839.39	3.0.0
efficientnetv2b0_224_fft onnx	food101	Int8/Int4	224x224x3	STM32N6	1911.56	0.0	4237.52	3.0.0
efficientnetv2b1_240_fft onnx	food101	Int8	240x240x3	STM32N6	2604.03	0.0	8089.27	3.0.0
efficientnetv2b1_240_fft onnx	food101	Int8/Int4	240x240x3	STM32N6	2604.03	0.0	4995.39	3.0.0
efficientnetv2b2_260_fft onnx	food101	Int8	260x260x3	STM32N6	2712.19	528.12	10328.52	3.0.0
efficientnetv2b2_260_fft onnx	food101	Int8/Int4	260x260x3	STM32N6	2712.19	528.12	6865.39	3.0.0
efficientnetv2s_384_fft onnx	food101	Int8	384x384x3	STM32N6	2757	3456	24262.34	3.0.0
efficientnetv2s_384_fft onnx	food101	Int8/Int4	384x384x3	STM32N6	2757	3456	14836.94	3.0.0
efficientnetv2b0_224 onnx	imagenet	Int8	224x224x3	STM32N6	1911.56	0.0	7967.05	3.0.0
efficientnetv2b0_224 onnx	imagenet	Int8/Int4	224x224x3	STM32N6	1911.56	0.0	5710.05	3.0.0
efficientnetv2b1_240 onnx	imagenet	Int8	240x240x3	STM32N6	2604.03	0.0	9216.92	3.0.0
efficientnetv2b1_240 onnx	imagenet	Int8/Int4	240x240x3	STM32N6	2604.03	0.0	6342.67	3.0.0
efficientnetv2b2_260 onnx	imagenet	Int8	260x260x3	STM32N6	2712.19	528.12	11568.55	3.0.0
efficientnetv2b2_260 onnx	imagenet	Int8/Int4	260x260x3	STM32N6	2712.19	528.12	8273.17	3.0.0
efficientnetv2b3_300 onnx	imagenet	Int8	300x300x3	STM32N6	2574.47	1757.81	16510.05	3.0.0
efficientnetv2b3_300 onnx	imagenet	Int8/Int4	300x300x3	STM32N6	2574.47	1757.81	10376.74	3.0.0
efficientnetv2s_384 onnx	imagenet	Int8	384x384x3	STM32N6	2800	2592	25390	3.0.0
efficientnetv2s_384 onnx	imagenet	Int8/Int4	384x384x3	STM32N6	2800	2592	15458.97	3.0.0

Reference NPU inference time on food101 and imagenet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Board	Execution Engine	Inference time (ms)	Inf / sec	STEdgeAI Core version
efficientnetv2b0_224_fft onnx	food101	Int8	224x224x3	STM32N6570-DK	NPU/MCU	62.48	16	3.0.0
efficientnetv2b0_224_fft onnx	food101	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	57.05	17.53	3.0.0
efficientnetv2b1_240_fft onnx	food101	Int8	240x240x3	STM32N6570-DK	NPU/MCU	86.55	11.55	3.0.0
efficientnetv2b1_240_fft onnx	food101	Int8/Int4	240x240x3	STM32N6570-DK	NPU/MCU	80.5	12.42	3.0.0
efficientnetv2b2_260_fft onnx	food101	Int8	260x260x3	STM32N6570-DK	NPU/MCU	147.21	6.79	3.0.0
efficientnetv2b2_260_fft onnx	food101	Int8/Int4	260x260x3	STM32N6570-DK	NPU/MCU	140.38	7.12	3.0.0
efficientnetv2s_384_fft onnx	food101	Int8	384x384x3	STM32N6570-DK	NPU/MCU	1089.83	0.92	3.0.0
efficientnetv2s_384_fft onnx	food101	Int8/Int4	384x384x3	STM32N6570-DK	NPU/MCU	1078.35	0.93	3.0.0
efficientnetv2b0_224 onnx	imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	65.44	15.28	3.0.0
efficientnetv2b0_224 onnx	imagenet	Int8/Int4	224x224x3	STM32N6570-DK	NPU/MCU	59.54	16.80	3.0.0
efficientnetv2b1_240 onnx	imagenet	Int8	240x240x3	STM32N6570-DK	NPU/MCU	89.71	11.15	3.0.0
efficientnetv2b1_240 onnx	imagenet	Int8/Int4	240x240x3	STM32N6570-DK	NPU/MCU	83.2	12.02	3.0.0
efficientnetv2b2_260 onnx	imagenet	Int8	260x260x3	STM32N6570-DK	NPU/MCU	150.04	6.66	3.0.0
efficientnetv2b2_260 onnx	imagenet	Int8/Int4	260x260x3	STM32N6570-DK	NPU/MCU	141.94	7.05	3.0.0
efficientnetv2b3_300 onnx	imagenet	Int8	300x300x3	STM32N6570-DK	NPU/MCU	224.03	4.46	3.0.0
efficientnetv2b3_300 onnx	imagenet	Int8/Int4	300x300x3	STM32N6570-DK	NPU/MCU	219.31	4.56	3.0.0
efficientnetv2s_384 onnx	imagenet	Int8	384x384x3	STM32N6570-DK	NPU/MCU	839.14	1.19	3.0.0
efficientnetv2s_384 onnx	imagenet	Int8/Int4	384x384x3	STM32N6570-DK	NPU/MCU	826.23	1.21	3.0.0

Accuracy with Food-101 dataset

Dataset details: link, Quotation[3] , Number of classes: 101 , Number of images: 101 000

Model	Format	Resolution	Top 1 Accuracy
efficientnetv2b0_224_fft	Float	224x224x3	86.59 %
efficientnetv2b0_224_fft onnx	Int8	224x224x3	85.98 %
efficientnetv2b0_224_fft onnx	Int8/Int4	224x224x3	84.47 %
efficientnetv2b1_240_fft	Float	240x240x3	87.71 %
efficientnetv2b1_240_fft onnx	Int8	240x240x3	87.09 %
efficientnetv2b1_240_fft onnx	Int8/Int4	240x240x3	85.71 %
efficientnetv2b2_260_fft	Float	260x260x3	88.67 %
efficientnetv2b2_260_fft onnx	Int8	260x260x3	88.44 %
efficientnetv2b2_260_fft onnx	Int8/Int4	260x260x3	87.24 %
efficientnetv2s_384_fft	Float	384x384x3	91.69 %
efficientnetv2s_384_fft onnx	Int8	384x384x3	91.34 %
efficientnetv2s_384_fft onnx	Int8/Int4	384x384x3	89.87 %

Accuracy with imagenet

Dataset details: link, Quotation[4]. Number of classes: 1000. To perform the quantization, we calibrated the activations with a random subset of the training set. For the sake of simplicity, the accuracy reported here was estimated on the 10000 labelled images of the validation set.

Model	Format	Resolution	Top 1 Accuracy
efficientnetv2b0_224	Float	224x224x3	75.18 %
efficientnetv2b0_224 onnx	Int8	224x224x3	73.75 %
efficientnetv2b0_224 onnx	Int8/Int4	224x224x3	73.38 %
efficientnetv2b1_240	Float	240x240x3	76.14 %
efficientnetv2b1_240 onnx	Int8	240x240x3	75.19 %
efficientnetv2b1_240 onnx	Int8/Int4	240x240x3	73.92 %
efficientnetv2b2_260	Float	260x260x3	76.58 %
efficientnetv2b2_260 onnx	Int8	260x260x3	76.14 %
efficientnetv2b2_260 onnx	Int8/Int4	260x260x3	74.71 %
efficientnetv2b3_300	Float	300x300x3	79.18 %
efficientnetv2b3_300 onnx	Int8	300x300x3	79.05 %
efficientnetv2b3_300 onnx	Int8/Int4	300x300x3	78.11 %
efficientnetv2s_384	Float	384x384x3	83.52 %
efficientnetv2s_384 onnx	Int8	384x384x3	83.07 %
efficientnetv2s_384 onnx	Int8/Int4	384x384x3	82.25 %

Retraining and Integration in a simple example:

Please refer to the stm32ai-modelzoo-services GitHub here

References

[1] "Tf_flowers : tensorflow datasets," TensorFlow. [Online]. Available: https://www.tensorflow.org/datasets/catalog/tf_flowers.

[2] J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1

[3] L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.

[4] Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) imagenet Large Scale Visual Recognition Challenge.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for STMicroelectronics/efficientnetv2

EfficientNetV2: Smaller Models and Faster Training

Paper • 2104.00298 • Published Apr 1, 2021 • 1