2. Model Conversion

This chapter mainly introduces how to convert network models from different platforms into offline IPU network models that can be run on development boards, model conversion diagram:

2_0

Model conversion process description:

① Model file

② Configuration file input_config.ini

③ Data images required for model conversion, preprocessing script preprocess.py

Input these into the model conversion tool SGS_converter, and you will obtain the IPU offline network model img file that can be run on the development board.

The IPU toolchain currently supports converting models from the following platforms to offline networks:

2_01

Among them, ONNX is the main model framework supported by the IPU toolchain, which can provide relatively complete functionality iteration and technical service support. Models from other frameworks are in maintenance support status. It is recommended to prioritize converting models under the ONNX framework to obtain the best experience.

The following will detail how to use the IPU Toolchain to generate offline models.

1. Model Conversion Configuration File — Guide to Using input_config.ini¶

In the instructions for converting models from each platform to IPU offline network models, the required parameter --input_config must be used to specify the path of the input tensor configuration file input_config.ini. This section mainly describes how to configure the input_config.ini file required during the model conversion process.

1.1 Overview of input config.ini Contents¶

2_11

This file is mainly divided into three parts for information configuration:

[INPUT_CONFIG]   # Configure the normalization information for image preprocessing of the network model

[OUTPUT_CONFIG]  # Configure the quantization processing information for network model inputs and outputs

[CONV_CONFIG]     # optional, configure the quantization information for convolution in the network model

The following sections will introduce the configuration methods for basic and advanced versions.

1.2 Basic Usage of input config.ini¶

Configuration example and explanation:

[INPUT_CONFIG]
inputs= data;                   # Input node name, multiple inputs should be separated by ","
training_input_formats=RGB;     # Input format during model training, the number and order should match the configuration of `inputs`
input_formats=BGR;              # Input format on the board side, the number and order should match the configuration of `inputs`
quantizations=TRUE;             # Whether to enable input quantization, options are TRUE or FALSE, the number and order should match the configuration of `inputs`
mean=127.5:127.5:127.5;         # Mean value, order is RGB, multiple inputs should be separated by ","
std_value = 255;                # Standard deviation, multiple inputs should be separated by ",", if each channel has a corresponding `std_value`, separate with a colon ( : ), order is RGB
[OUTPUT_CONFIG]
outputs= prob;                  # Output node name, multiple inputs should be separated by ","
dequantizations=TRUE;           # Whether to enable dequantization, fill in according to actual needs, recommended to be TRUE, the number and order should match the configuration of `outputs`
[CONV_CONFIG]                   #  optional
tensor_arrays=conv1-1,conv2-1;  # Specify the quantization method for convolution in certain layers of the network, configure when higher accuracy is needed.

1.3 Advanced Usage of input config.ini¶

The following will explain the configurable parameters for [INPUT_CONFIG], [OUTPUT_CONFIG], and [CONV_CONFIG]:

1.3.1 Detailed Explanation of Configurable Information in [INPUT_CONFIG]¶

1. Configurable Parameter Explanations

Currently supported parameters include: inputs, training_input_formats, input_formats, quantizations, mean, std_value, input_width_alignment (optional), input_height_alignment (optional), input_layouts (optional).

(1) inputs: Name of the input Tensor for the network.

Usage

Multiple input Tensors should be separated by a comma ( , ). The number and order of model input Tensor must match the configuration of inputs.
The length of all input names must not exceed 2048 characters.

(2) training_input_formats: Image format during network training.

Usage

Multiple input Tensors should be separated by a comma ( , ). The number and order of model input Tensor must match the configuration of inputs.
Optional formats include:
- RGB
- BGR
- GRAY
- RAWDATA_U8_NHWC
- RAWDATA_S16_NHWC
- RAWDATA_F32_NHWC
- Recommended configuration values for each model can be found in the next section's table

(3) input_formats: Image input format when the network model runs on the IPU chip.

Usage

Multiple input Tensors should be separated by a comma ( , ). The number and order of model input Tensor must match the configuration of inputs.
Optional formats include:
- RGB
- BGR
- RGBA
- BGRA
- YUV_NV12
- GRAY
- RAWDATA_U8_NHWC
- RAWDATA_S16_NHWC
- RAWDATA_F32_NHWC
- Recommended configuration values for each model can be found in the next section's table

(4) quantizations: Used to indicate whether the data of all input Tensors needs to be quantized, TRUE or FALSE.

Usage

Multiple input Tensors should be separated by a comma ( , ). The number and order of model input Tensors must match the configuration of inputs.

(5) mean: Used for preprocessing images during network training.

The formula for normalizing images is as follows: $$ \text{normalized_image} = \frac{\text{original_image} - (\text{mean_R}, \text{mean_G}, \text{mean_B})}{\text{std_value}} $$

Usage

The order is RGB, representing the mean value for the respective channels.
The number equals that of the inputs, and multiple input Tensors should be separated by a comma ( , ) without spaces in between.
For RAWDATA format data, please do not configure mean; for other formats that have not undergone normalization, set this value to 0.

(6) std_value: Used for preprocessing images during network training.

Usage

The number equals that of the inputs, and multiple input Tensors should be separated by a comma ( , ) without spaces in between.
If each channel has a corresponding std_value, separate them with a colon ( : ), in RGB order.
For RAWDATA format data, please do not configure mean; for other formats that have not undergone normalization, set this value to 0.

(7) input_width_alignment (optional): Used to indicate the alignment quantity in the Width direction when data is input into the network.

Usage

If there are multiple input Tensors, separate them with a comma ( , ) without spaces in between.
Recommended configuration values for each model can be found in the 'Alignment_rules' column of the next section's table.

(8) input_height_alignment (optional): Used to indicate the alignment quantity in the Height direction when data is input into the network.

Usage

If there are multiple input Tensors, separate them with a comma ( , ) without spaces in between.
Recommended configuration values for each model can be found in the 'Alignment_rules' column of the next section's table.

(9) input_layouts (optional): Used to indicate the layout format of the model input Tensors.

Usage

Default is NHWC.
Optional data layout formats include:
- NCHW: Only applicable for 4D input Tensors
  
  -- 4D Tensor processing: When specified as NCHW, it is enforced to be laid out in NCHW.
  
  -- Non-4D Tensor processing: The dimensional input remains unchanged, but NHWC placeholder must be used in input_layouts.
- NHWC: Only applicable for 4D input Tensors
  
  -- 4D Tensor processing: When specified as NHWC, it is enforced to be laid out in NHWC.
  
  -- Non-4D Tensor processing: The dimensional input remains unchanged, but NHWC placeholder must be used in input_layouts.
The quantity equals that of the inputs, and the order of the model input Tensors must match the configuration order of input_layouts, with multiple input Tensors separated by a comma (,) without spaces in between.
Note: When the input_layouts for 4D inputs is set to NCHW, ensure that the corresponding preprocessing script (introduced in the next section) returns data laid out in NCHW, and that the input data layout when running the offline model on the board side is also in NCHW.
Note: When the input_formats are set to YUV_NV12 and the input_layouts is set to NCHW, ensure that the corresponding preprocessing script (introduced in the next section) returns data laid out in NCHW, while the input data on the board side is normally laid out in YUV_NV12.
Note: When input_formats are set to RGBA or BGRA, specifying corresponding input_layouts is not supported.

2. Model Parameter Configuration Constraints and Selection Combinations

2_13

Usage

RAWDATA_U16_NHWC: Only applicable when the input is directly connected to the embedding layer, for example: the embedding layer of the CLIP Text model can be configured as RAWDATA_U16_NHWC when the index of its embedding layer satisfies the condition 32727 < index < 65535.

Usage

Gray Model Conversion Notes:
- When input_formats is GRAY:
```
[INPUT_CONFIG]
......
training_input_formats=GRAY;
input_formats=GRAY;
......
```
  The actual data provided to Fixed or Offline models is gray images, at which point the corresponding input_width_alignment and input_height_alignment default to 1. If input_width_alignment and input_height_alignment are not set, the actual input will match the model input shape.
- When input_formats is YUV_NV12:
```
[INPUT_CONFIG]
......
training_input_formats=GRAY;
input_formats=YUV_NV12;
......
```
  The actual data provided to Fixed or Offline models is YUV_NV12 data, at which point the corresponding input_width_alignment and input_height_alignment default to 2.
- The input_config.ini file only needs to set the mean value for a single channel:
```
[INPUT_CONFIG]
......
mean=33.318;
std_value=1.0;
```

3. Considerations for Configuring Multi-Input Models:

(1) When the model has multiple inputs and mixed data formats, if rawdata format input is included, please note:

Usage

1) Configuration of mean/std normalization parameters for rawdata format inputs is prohibited;

2) In the ini configuration file, rawdata inputs must be arranged at the end.

For example: input_list = [input0,input1,input2], where input0 and input2 are RGB format and input1 is RAWDATA format, please configure the inputs in the ini as [input0,input2,input1]:

[INPUT_CONFIG]
inputs=input0,input2,input1;        # Adjust input order, placing RAWDATA format tensor at the end.
input_formats=RGB,RGB,RAWDATA_F32_NHWC;
quantizations=TRUE,TRUE,TRUE;
mean=127.5:127.5:127.5,0:0:0;       # Only configure two sets, input1 prohibits normalization parameter
std_value = 255,255;                # Only configure two sets, input1 prohibits normalization parameter

(2) When the model has multiple inputs and mixed data dimensions, if input_layouts need to be configured, please note:

Usage

1) Non-4D Tensors must be represented using NHWC in input_layouts.
For example: input_list = [input0,input1,input2], where input0 is a 3D tensor and input1 and input2 are 4D tensors, specify input1 to be arranged according to NCHW:
```
[INPUT_CONFIG]
inputs=input0,input1,input2;
…
input_layouts=NHWC,NCHW,NHWC;       # input0 is a non-4D tensor, need to configure a placeholder
```

1.3.2 Detailed Explanation of Configurable Information in [OUTPUT_CONFIG]¶

Currently supported parameters include: outputs, dequantizations, output_layouts (optional), output_formats (optional).

(1) outputs: Name of the network output node.

Usage

Multiple output Tensors should be separated by a comma ( , ), maintaining the same order as the model.
The length of all output names must not exceed 2048 characters.
When converting networks with post-processing, the outputs of the Backbone network have different names from the full network outputs; the other settings should be completely consistent.

(2) dequantizations: Used to indicate whether the data of all output Tensors needs to be dequantized, TRUE or FALSE.

Usage

Only effective when running on the board, the quantity equals the number of outputs, and the output Tensors should be separated by commas ( , ) without spaces in between.
If dequantizations is configured as TRUE, a Fix2Float operator will be added to the model output, and the output data type will be float32.
If dequantizations is configured as FALSE, the output data type when running on the board will be int16.

(3) output_layouts (optional): Used to indicate the data layout format of the output Tensors.

Usage

If this item is not set, the default data layout format will be NHWC.
Optional data layout formats include:
- NCHW: NCHW indicates that the corresponding 4D output Tensors are laid out in NCHW format.
- NHWC: NHWC indicates that the corresponding 4D output Tensors are laid out in NHWC format.
The quantity equals the number of outputs, and the order of the model output Tensors must match the configuration order of output_layouts, with multiple input Tensors separated by a comma (,) without spaces in between.

(4) output_formats (optional): Indicates the data format of the output Tensors.

Usage

If this item is not set, the default data format will be based on the configuration of dequantizations: when dequantizations is TRUE, the output is RAWDATA_F32_NHWC; when dequantizations is FALSE, the output is RAWDATA_S16_NHWC and will not be dequantized.
Optional data formats include:
- RAWDATA_F32_NHWC: Model outputs float32 type data. dequantizations can only be configured as TRUE.
- RAWDATA_S16_NHWC: Model outputs int16 type data. When dequantizations is TRUE, the output int16 will be dequantized back to actual values; when dequantizations is FALSE, the output int16 will not be dequantized.
- RAWDATA_U8_NHWC: Model inputs uint8 type data. When dequantizations is TRUE, the output uint8 will be dequantized back to actual values; when dequantizations is FALSE, the output uint8 will not be dequantized.
If there are multiple output Tensors, they should be separated by a comma (,) without spaces in between. The quantity equals the number of outputs, and the order of the model output Tensors must match the configuration order of output_formats.

1.3.3 Detailed Explanation of Configurable Information in [CONV_CONFIG] (optional)¶

This section includes parameters that are not required, currently supported parameters include: input_format (optional), input_format (optional).

(1) input_format (optional): Specify the quantization method for all convolutions in the network.

Usage

The conversion tool defaults to the quantization method recommended by the IPU Toolchain, and you can also specify quantization by configuring input_format. Optional options include:
- ALL_UINT8: Specify that all convolutions use UINT8 quantization.
- ALL_INT16: Specify that all convolutions use INT16 quantization.
- CONV2D_INT16: Specify that only all normal convolutions use INT16 quantization.
- DEPTHWISE_INT16: Specify that only all Depthwise convolutions use INT16 quantization.
In ALL_UINT8 mode, the bandwidth occupied by convolution operations is small and the running speed is fast; in ALL_INT16 mode, accuracy can be greatly improved but running speed may be affected.
After configuring input_format, the recommended quantization strategies using calibrator / torch_calibrator will become ineffective. If you want to use the quantization strategies recommended by calibrator / torch_calibrator, you need to delete input_format and convert the model again.

(2) tensor_arrays (optional): Specify the quantization method for certain convolution layers in the network.

Usage

The conversion tool defaults to the quantization method recommended by the IPU Toolchain; if certain convolution layers require higher precision, you can configure 'tensor_arrays' to specify the quantization method for specific convolution layers.
Fill in the input tensor name of the first input for the convolution layer. If you need to configure multiple convolution layers for higher precision, separate the input tensor names with commas ( , ).
The first convolution layer in the network does not take effect when specifying quantization.

2. Model Conversion Configuration File - Preprocessing Script Writing Guide¶

When converting models for different platforms to IPU offline network models, it is necessary to use the required parameter -n to specify the path to the preprocessing script. This section mainly introduces how to configure the preprocessing files required during the model conversion process.

2_12

2.1 Preprocessing File Writing Example¶

Taking the onnx_yolov8s network as an example, this section explains how to write the image preprocessing file, with a demo presentation:

import cv2
import numpy as np

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=False, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


def image_preprocess(image_file, norm=True, new_shape=(640, 640)):
    im = cv2.imread(image_file)

    im, ratio, (dw, dh) = letterbox(im, new_shape)

    im = im[:, :, ::-1]  # BGR to RGB
    im = np.expand_dims(im, 0)
    im = np.ascontiguousarray(im)
    if norm:
        return im.astype(np.float32) / 255
    else:
        return im

Usage

The function prototype image_preprocess(img_path, norm=True) needs to be declared for calling, and the function should return a numpy.ndarray data type.
When the model's training_input_formats are RGB / BGR / GRAY, the image_preprocess(img_path, norm=True) function must include the following 2 parameters:
- Image path
- Normalization flag (norm=True) The normalization flag is used to distinguish whether the network model is a floating-point model. In the floating-point network model stage, the normalization of the images needs to be handled before being sent to the network. However, the fixed-point network model and offline network model already contain the setup information from the input_config.ini file, allowing the image data to be normalized automatically, so the data sent to the network model does not need normalization, which is the same as how it is processed on the IPU hardware.
The preprocessing file's handling of input data must remain consistent with the model training, and the preprocessing script processes the data files into the format of the model's training_input_formats.

2.2 Instructions for Writing Preprocessing Files¶

The preprocessing for different data formats varies slightly, and examples will be provided below:

① Image Input Model Preprocessing Instructions

Applicable to models where training_input_formats are RGB / BGR / GRAY.

Users primarily focus on processing data files into the model's training_input_formats format in the image input model preprocessing script.

Using the preprocessing script of torchvision's resnet18 as an example:

from PIL import Image
import numpy as np
from torchvision.transforms import transforms

def sim_standard_preprocess(image_file, input_size=224, norm=True):
    ori_image = Image.open(image_file).convert('RGB')
    if input_size == 299:
        scaled_size = input_size
    else:
        scaled_size = 256
    image = transforms.Resize(scaled_size)(ori_image)
    image = transforms.CenterCrop(input_size)(image)
    if norm:
        image = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            ]
        )(image)
        image = image.numpy()
        image = np.expand_dims(image, 0).astype(np.float32)
        image = np.transpose(image, axes=(0, 2, 3, 1)).copy()
    else:
        image = image.numpy()
        image = np.expand_dims(image, 0).astype(np.uint8)
    return image


def image_preprocess(image_file, input_size=224, norm=True):
    sim_processed = sim_standard_preprocess(image_file, input_size, norm=norm)
    return sim_processed

Usage

The formula for normalizing images, along with the aforementioned mean/std information converted to input_config.ini, is as follows:
```
[INPUT_CONFIG]
mean=123.675:116.28:103.53;
std_value=58.395:57.12:57.375;
```
It is essential to ensure that the mean and std parameters in the preprocessing script align with the information configured in input_config.ini.

Using the preprocessing script example for a GRAY input model of caffe's caffe_lenet:

import cv2
import numpy as np

def get_image(img_path, resizeH=28, resizeW=28, norm=True, mean=33.318, std=1.0):
    img = cv2.imread(img_path)
    if img is None:
        raise FileNotFoundError('No such image: {}'.format(img_path))

    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_norm = cv2.resize(img, (resizeW, resizeH), interpolation=cv2.INTER_LINEAR)

    if norm:
        img_norm = (img_norm - mean) / std
        img_norm = np.expand_dims(img_norm, axis=2)
        img_norm = img_norm.astype('float32')
    else:
        img_norm = np.expand_dims(img_norm, axis=2)

    return np.expand_dims(img_norm, 0)

def image_preprocess(img_path, norm=True):
    return get_image(img_path, norm=norm)

② Non-Image Input Model Preprocessing Example

pre

The example input data is stored in an npy file. Please replace it with the actual processing logic for real use.

Users of non-image input models need to consider the input data formats for float / fixed / offline models. When norm is True, it returns the format required for the float model; when norm is False, it returns the format required for the fixed / offline models.

The float model generally matches the original data format of the model's input.

1) Applicable to models where both training_input_formats and input_formats are RAWDATA_F32_NHWC.

Preprocessing script example:

import numpy as np

def image_preprocess(img_path, norm=True):
    data = np.load(img_path)    # img_path is npy file, contains float32 dtype ndarray
    data = data.astype(np.float32)
    return data

def image_preprocess(img_path, norm=True):
    return get_image(img_path, norm=norm)

2) Applicable to models where both training_input_formats and input_formats are RAWDATA_S16_NHWC. Preprocessing script example:

import numpy as np

def image_preprocess(img_path, norm=True):
    data = np.load(img_path)    # img_path is npy file, contains int32 or bool dtype ndarray
    if norm:
        data = data.astype(np.int32)    # or astype(np.bool_) / astype(np.float32) ……
    else:
        data = data.astype(np.int16)
    return data


def image_preprocess(img_path, norm=True):
    return get_image(img_path, norm=norm)

3) Applicable to models where both training_input_formats and input_formats are RAWDATA_U16_NHWC.

Usage

RAWDATA_U16_NHWC: Only applicable when the input is directly connected to the embedding layer, for example: the embedding layer of the CLIP Text model can be configured as RAWDATA_U16_NHWC when the index of its embedding layer satisfies the condition 32727 < index < 65535.

Preprocessing script example:

import numpy as np

def image_preprocess(img_path, norm=True):
    data = np.load(img_path)    # img_path is npy file, contains int32 dtype ndarray
    if norm:
        data = data.astype(np.int32)
    else:
        data = data.astype(np.uint16)
    return data


def image_preprocess(img_path, norm=True):
    return get_image(img_path, norm=norm)

4) Applicable to models where both training_input_formats and input_formats are RAWDATA_U8_NHWC. Preprocessing script example:

import numpy as np

def image_preprocess(img_path, norm=True):
    data = np.load(img_path)    # img_path is npy file, contains uint8 dtype ndarray
    if norm:
        data = data.astype(np.uint8)
    else:
        data = data.astype(np.uint8)
    return data

def image_preprocess(img_path, norm=True):
    return get_image(img_path, norm=norm)

3. Guide to Converting Original Models to Edge Models¶

The SGS_converter tool in the IPU Toolchain supports one-click generation of IPU offline network models for development boards. 3_0

3.1. Introduction to the SGS_converter Tool¶

1. Tool Location: SGS_IPU_Toolchain/Scripts/ConvertTool/SGS_converter.py.

2. Tool Function: Converts models into SGS offline network models with a single click.

3. Usage Example::Taking the conversion of an ONNX framework model to an offline network as an example:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/SGS_converter.py onnx
--model_file  ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.onnx \
--input_config ~/SGS_Models/onnx/onnx_yolov8s/input_config.ini \
-i ~/SGS_Models/resource/classify/ILSVRC2012_test_00000002.bmp \
-n ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.py \
--output_file ./onnx_yolov8s_offline.img \
--soc_version CHIP

The required parameter information for converting models from different platforms to SGS offline network models varies slightly. If you want to further understand the parameters needed for conversion on each platform (specific instructions on the use of parameters will be provided in the following sections), you can execute

python3 SGS_converter.py {platform} –h

To view the execution parameters of the CAFFE framework:

python3 SGS_converter.py caffe –h

3.2 Detailed Use of the SGS_converter Tool¶

This section mainly introduces how to configure parameters when using the SGS_converter tool to convert models from various platforms.

3.2.1 Converting ONNX Models¶

3_1

Example of tool usage:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/SGS_converter.py onnx
--model_file  ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.onnx \
--input_config ~/SGS_Models/onnx/onnx_yolov8s/input_config.ini \
-i ~/SGS_Models/resource/classify/ILSVRC2012_test_00000002.bmp
-n ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.py \
--output_file ./onnx_yolov8s_offline.img \
--export_models \
--soc_version CHIP

(1) Required Parameters:

--model_file: Specify the path to the ONNX model file to be converted.
--input_config: Path to the input_config.ini file.
-i: Input file / Input folder path / Specify input path list file.

Usage

When the -i parameter is passed in the form of Specify input path list file:

Create a new input_list.txt with the following content:
- For a single input network model:
```
/path/to/image_test/2007000364.jpg
/path/to/image_test/2007000365.jpg
/path/to/image_test/2007000366.jpg
```
- For a multiple input network model, separate each input path with a comma (,):
```
/path/to/image_test/2007000364.jpg,/path/to/image_test/ILSVRC2012_test_00000002.bmp
/path/to/image_test/2007000365.jpg,/path/to/image_test/ILSVRC2012_test_00000003.bmp
/path/to/image_test/2007000366.jpg,/path/to/image_test/ILSVRC2012_test_00000004.bmp
```
Multiple sets of data can be written on the next line, and each line is considered a single input for the model. After completing input_list.txt the -i parameter should be: /path/to/input_list.txt
When the -i/--image parameter is the path of a single image, the model must be a single input model.
When the -i/--image parameter is the path of an image folder, the model can only be a single input model and will read all data within the folder.

-n: Path to the preprocessing Python file.

Usage

For a multiple input model, the preprocessing parameters need to use multiple preprocessing methods; the number and order of the preprocessing Python file paths must remain consistent with the number and order of the model inputs. For example:

-n /path/to/preprocess1.py,/path/to/preprocess2.py or --preprocess /path/to/preprocess1.py,/path/to/preprocess2.py
Please use the same image preprocessing method as during training; each input's preprocessing method should be written in an independent Python file.

--output_file: Specify the name of the generated img file.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

--export_models: Specify the model file that generates intermediate results. If not configured, only the offline file will be generated.
-q: When using torch_calibrator to quantize a float model, this parameter must be specified.
--quant_config: When setting the -q parameter, this parameter must be used to pass the quant_config.yaml quantization parameter file.

Usage

If the onnx model contains post-processing operators, please remove them first. Currently, only conversion of the backbone network is supported. If the onnx model is generated by another framework, be sure to disable the option for generating post-processing during model conversion.

3.2.2 Converting CAFFE Models¶

3_2

Example of tool usage:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/SGS_converter.py onnx
--model_file  ~/SGS_Models/caffe/caffe_mobilenet_v2/caffe_mobilenet_v2.prototxt \
--weight_file ~/SGS_Models/caffe/caffe_mobilenet_v2/caffe_mobilenet_v2.caffemodel \
--input_config ~/SGS_Models/caffe/caffe_mobilenet_v2/input_config.ini \
-i ~/SGS_Models/resource/classify/ILSVRC2012_test_00000002.bmp \
-n ~/SGS_Models/caffe/caffe_mobilenet_v2/caffe_mobilenet_v2.py \
--output_file ./caffe_mobilenet_v2.img \
--export_models \
--soc_version CHIP

(1) Required Parameters:

--model_file: Specify the path to the CAFFE model file to be converted.
--weight_file: Specify the path to the CAFFE weight file to be converted.
--input_config: Path to the input_config.ini file.
-i: Input file / Input folder path / Specify input path list file.

Usage

When the -i parameter is passed in the form of Specify input path list file:

Create a new input_list.txt with the following content:
- For a single input network model:
```
/path/to/image_test/2007000364.jpg
/path/to/image_test/2007000365.jpg
/path/to/image_test/2007000366.jpg
```
- For a multiple input network model, separate each input path with a comma (,):
```
/path/to/image_test/2007000364.jpg,/path/to/image_test/ILSVRC2012_test_00000002.bmp
/path/to/image_test/2007000365.jpg,/path/to/image_test/ILSVRC2012_test_00000003.bmp
/path/to/image_test/2007000366.jpg,/path/to/image_test/ILSVRC2012_test_00000004.bmp
```
Multiple sets of data can be written on the next line, and each line is considered a single input for the model. After completing input_list.txt the -i parameter should be: /path/to/input_list.txt
When the -i/--image parameter is the path of a single image, the model must be a single input model.
When the -i/--image parameter is the path of an image folder, the model can only be a single input model and will read all data within the folder.

-n: Path to the preprocessing Python file.

Usage

For a multiple input model, the preprocessing parameters need to use multiple preprocessing methods; the number and order of the preprocessing Python file paths must remain consistent with the number and order of the model inputs. For example:

-n /path/to/preprocess1.py,/path/to/preprocess2.py or --preprocess /path/to/preprocess1.py,/path/to/preprocess2.py
Please use the same image preprocessing method as during training; each input's preprocessing method should be written in an independent Python file.

--output_file: Specify the name of the generated img file.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

--export_models: Specify the model file that generates intermediate results. If not configured, only the offline file will be generated.
-q: When using torch_calibrator to quantize a float model, this parameter must be specified.
--quant_config: When setting the -q parameter, this parameter must be used to pass the quant_config.yaml quantization parameter file.

3.2.3 Converting TFLITE Models¶

3_3

Example of tool usage:

python3 SGS_converter.py tflite \
--model_file Debug_save_model_float.tflite \
--input_config input_config.ini \
-i file.list \
-n preprocess.py \
--output_file save_model.img \
--soc_version CHIP

(1) Required Parameters:

--model_file: Specify the path to the TFLITE model file to be converted.
--input_config: Path to the input_config.ini file.
-i: Input file / Input folder path / Specify input path list file.

Usage

When the -i parameter is passed in the form of Specify input path list file:

Create a new input_list.txt with the following content:
- For a single input network model:
```
/path/to/image_test/2007000364.jpg
/path/to/image_test/2007000365.jpg
/path/to/image_test/2007000366.jpg
```
- For a multiple input network model, separate each input path with a comma (,):
```
/path/to/image_test/2007000364.jpg,/path/to/image_test/ILSVRC2012_test_00000002.bmp
/path/to/image_test/2007000365.jpg,/path/to/image_test/ILSVRC2012_test_00000003.bmp
/path/to/image_test/2007000366.jpg,/path/to/image_test/ILSVRC2012_test_00000004.bmp
```
Multiple sets of data can be written on the next line, and each line is considered a single input for the model. After completing input_list.txt the -i parameter should be: /path/to/input_list.txt
When the -i/--image parameter is the path of a single image, the model must be a single input model.
When the -i/--image parameter is the path of an image folder, the model can only be a single input model and will read all data within the folder.

-n: Path to the preprocessing Python file.

Usage

For a multiple input model, the preprocessing parameters need to use multiple preprocessing methods; the number and order of the preprocessing Python file paths must remain consistent with the number and order of the model inputs. For example:

-n /path/to/preprocess1.py,/path/to/preprocess2.py or --preprocess /path/to/preprocess1.py,/path/to/preprocess2.py
Please use the same image preprocessing method as during training; each input's preprocessing method should be written in an independent Python file.

--output_file: Specify the name of the generated img file.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

--export_models: Specify the model file that generates intermediate results. If not configured, only the offline file will be generated.
-q: When using torch_calibrator to quantize a float model, this parameter must be specified.
--quant_config: When setting the -q parameter, this parameter must be used to pass the quant_config.yaml quantization parameter file.

3.2.4 Converting TENSORFLOW_GRAPHDEF Models¶

3_4

Example of tool usage:

python3 SGS_converter.py tensorflow_graphdef \
--graph_def_file yolo_v3_7/yolo_v3_7.pb \
--input_config input_config.ini \
--input_shapes 1,299,299,3 \
-i file.list \
-n yolo_v3_7.py \
--output_file yolo_v3_7.img \
--soc_version CHIP

(1) Required Parameters:

--graph_def_file: Specify the path to the model file to be converted.
--input_config: Path to the input_config.ini file.
--input_shapes: Specify the shape of the network input Tensor; the number of shapes corresponds to the number of inputs and multiple shapes are separated by colons (:).
-i: Input file / Input folder path / Specify input path list file.

Usage

When the -i parameter is passed in the form of Specify input path list file:

Create a new input_list.txt with the following content:
- For a single input network model:
```
/path/to/image_test/2007000364.jpg
/path/to/image_test/2007000365.jpg
/path/to/image_test/2007000366.jpg
```
- For a multiple input network model, separate each input path with a comma (,):
```
/path/to/image_test/2007000364.jpg,/path/to/image_test/ILSVRC2012_test_00000002.bmp
/path/to/image_test/2007000365.jpg,/path/to/image_test/ILSVRC2012_test_00000003.bmp
/path/to/image_test/2007000366.jpg,/path/to/image_test/ILSVRC2012_test_00000004.bmp
```
Multiple sets of data can be written on the next line, and each line is considered a single input for the model. After completing input_list.txt the -i parameter should be: /path/to/input_list.txt
When the -i/--image parameter is the path of a single image, the model must be a single input model.
When the -i/--image parameter is the path of an image folder, the model can only be a single input model and will read all data within the folder.

-n: Path to the preprocessing Python file.

Usage

For a multiple input model, the preprocessing parameters need to use multiple preprocessing methods; the number and order of the preprocessing Python file paths must remain consistent with the number and order of the model inputs. For example:

-n /path/to/preprocess1.py,/path/to/preprocess2.py or --preprocess /path/to/preprocess1.py,/path/to/preprocess2.py
Please use the same image preprocessing method as during training; each input's preprocessing method should be written in an independent Python file.

--output_file: Specify the name of the generated img file.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

--export_models: Specify the model file that generates intermediate results. If not configured, only the offline file will be generated.
-q: When using torch_calibrator to quantize a float model, this parameter must be specified.
--quant_config: When setting the -q parameter, this parameter must be used to pass the quant_config.yaml quantization parameter file.

4. Advanced Guide to Model Conversion: Stage-Wise Generation of Edge Models¶

The SGS_converter tool is used for one-click generation of edge models. This section mainly introduces the step-by-step generation of edge models. The process is illustrated as follows:

4_0

It is mainly divided into three stages:

Model ——> ConvertTool tool ——> Floating Point Network Model
            ——> Calibrator tool ——> Fixed Point Network Model
                ——> Compiler tool ——> Offline Network Model (Edge Model)

The usage methods of the network generation tools for each stage will be introduced below.

4.1. Stage One: Using the ConvertTool to Generate Floating Point Network Models¶

4_1

This section introduces how to use the ConvertTool to convert the original model to an SGS floating point network model.

1. Tool location: SGS_IPU_Toolchain/Scripts/ConvertTool/ConvertTool.py.

2. Tool function: Converts models to SGS floating point network models.

3. Example of usage:

Taking the conversion of an ONNX framework model to a floating point network as an example:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/ConvertTool.py onnx
--model_file  ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.onnx \
--input_config ~/SGS_Models/onnx/onnx_yolov8s/input_config.ini \
--output_file ./onnx_yolov8s_float.sim \
--soc_version CHIP

The required parameter information for converting models from different platforms to SGS floating point network models varies slightly. If you want to further understand the parameter information required for conversion from each platform (usage instructions for the parameters will be provided below), you can execute

python3 ConvertTool.py {platform} –h

to view. For example, to view the execution parameters for the CAFFE framework:

python3 ConvertTool.py caffe –h

4.1.1 Converting ONNX Models¶

Example of tool usage:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/ConvertTool.py onnx
--model_file  ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.onnx \
--input_config ~/SGS_Models/onnx/onnx_yolov8s/input_config.ini \
--output_file ./onnx_yolov8s_float.sim \
--soc_version CHIP

(1) Required Parameters:

--model_file: Specify the path to the Onnx model file to be converted.
--input_config: Path to the input_config.ini file.
--output_file: Output path of the converted model.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

--input_shapes: Onnx model input shape.

Dimensions are separated by commas (,), the number of shapes corresponds to the number of inputs, and multiple shapes are separated by colons (:).
By default, it reads from the model information; if the input has a dynamic resolution, specific shapes must be specified.
--skip_simplify: Skip optimizing the model using onnxsim; optimization is recommended.

By default, it uses onnxsim to optimize the model; if this option is configured, it will skip optimization.

Usage

If the onnx model contains post-processing operators, please remove them first. Currently, only conversion of the backbone network is supported.
If the onnx model is generated by another framework, be sure to disable the option for generating post-processing during model conversion.
Onnx model clipping command:

onnx.utils.extract_model(input_path, output_path, input_names, output_names, check_model=True)

onnx.utils.extract_model parameters description:

input_path (str | os.PathLike): Path to the original ONNX model.

output_path (str | os.PathLike): Path to save the clipped ONNX model.

input_names (list of string): Names of the input tensors for the clipped model.

output_names (list of string): Names of the output tensors for the clipped model.

check_model (bool): Whether to run a model check on the clipped model (default is True).

4.1.2 Converting CAFFE Models¶

Example of tool usage:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/ConvertTool.py caffe \
--model_file ~/SGS_Models/caffe/caffe_resnet50_conv/caffe_resnet50_conv.prototxt \
--weight_file ~/SGS_Models/caffe/caffe_resnet50_conv/caffe_resnet50_conv.caffemodel \
--input_config ~/SGS_Models/caffe/caffe_resnet50_conv/input_config.ini \
--output_file ./resnet50_float.sim \
--soc_version CHIP

(1) Required Parameters:

--model_file: Specify the path to the CAFFE model file to be converted.
--weight_file: Specify the path to the CAFFE weight file to be converted.
--input_config: Path to the input_config.ini file, which contains the configuration information for input config.
--output_file: Output path of the converted model.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters: None.

4.1.3 Converting TFLITE Models¶

Example of tool usage:

python3 SGS_converter.py tflite \
--model_file Debug_save_model_float.tflite \
--input_config input_config.ini \
--output_file save_model.img \
--soc_version CHIP

(1) Required Parameters:

--model_file: Specify the path to the model file to be converted.

The input model must be in tflite format (must be a non-quantized model).

--output_file: Specify the output model file.

Flatbuffer format, should have a .sim suffix.
--input_config: Path to the input_config.ini file.

This file contains the configuration information for the input tensor.

--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters: * None.

4.1.4 Converting TENSORFLOW_GRAPHDEF Models¶

Example of tool usage:

python3 ConvertTool.py tensorflow_graphdef \
--graph_def_file ~/SGS_Models/tensorflow/resnet_v2_50/resnet_v2_50.pb \
--output_file ./resnet_v2_float.sim \
--input_shapes 1,299,299,3 \
--input_config ~/SGS_Models/tensorflow/resnet_v2_50/input_config.ini\
--soc_version CHIP

(1) Required Parameters:

--graph_def_file: Specify the path to the input model to be converted.

This is the path to the TensorFlow frozen graphdef in pb format.
--output_file: Specify the name of the generated floating point network model file.

Flatbuffer format, should have a .sim suffix.
--input_shapes: Specify the shape of the network input Tensor.

The format is NHWC, with dimensions separated by commas (,); note that the number of shapes must correspond to the number of inputs and multiple shapes are separated by colons (:).

--input_config: Path to the input_config.ini file.

This file contains the configuration information for the input tensor.

--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

None.

4.2. Stage Two: Using the Calibrator Tool to Generate Fixed Point Network Models¶

4.2.1 Detailed Explanation of the Calibrator Tool¶

4_2

This section introduces how to use the Calibrator tool to convert SGS floating point network models to SGS fixed point network models.

1. Tool Location: SGS_IPU_Toolchain/Scripts/calibrator/calibrator.py.

2. Tool Function: Converts models to SGS fixed point network models.

3. Example of Usage:

Taking the ONNX floating point model converted in the previous section as an example:

python3 ~/SGS_IPU_Toolchain/Scripts/calibrator/calibrator.py
--model_file  ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s_float.sim \
--input_config ~/SGS_Models/onnx/onnx_yolov8s/input_config.ini \
-i ~/SGS_Models/resource/classify/ILSVRC2012_test_00000002.bmp \
-n ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s.py \
--soc_version CHIP

The Calibrator tool supports the conversion of most models, while a few models (such as segmented networks like faster-rcnn) require custom implementations of the Calibrator.

4. Explanation of Calibrator Parameters

(1) Required Parameters

-m or --model: Specify the path to the floating point network model file output by the ConvertTool.
--input_config: Specify the path to the input_config.ini file.
-i or --image: Specify input file / input folder path / Specify input path list file.

Usage

When the -i parameter is passed in the form of Specify input path list file:

Create a new input_list.txt with the following content:
- For a single input network model:
```
/path/to/image_test/2007000364.jpg
/path/to/image_test/2007000365.jpg
/path/to/image_test/2007000366.jpg
```
- For a multiple input network model, separate each input path with a comma (,):
```
/path/to/image_test/2007000364.jpg,/path/to/image_test/ILSVRC2012_test_00000002.bmp
/path/to/image_test/2007000365.jpg,/path/to/image_test/ILSVRC2012_test_00000003.bmp
/path/to/image_test/2007000366.jpg,/path/to/image_test/ILSVRC2012_test_00000004.bmp
```
  Multiple sets of data can be written on the next line, and each line is considered a single input for the model. After completing input_list.txt the -i parameter should be: /path/to/input_list.txt
When the -i/--image parameter is the path of a single image, the model must be a single input model.
When the -i/--image parameter is the path of an image folder, the model can only be a single input model and will read all data within the folder.

-n or --preprocess: Specify the path to the preprocessing Python file.

Usage

For a multiple input model, the preprocessing parameters need to use multiple preprocessing methods; the number and order of the preprocessing Python file paths must remain consistent with the number and order of the model inputs. For example:

-n /path/to/preprocess1.py,/path/to/preprocess2.py or --preprocess /path/to/preprocess1.py,/path/to/preprocess2.py
Please use the same image preprocessing method as during training; each input's preprocessing method should be written in an independent Python file.

--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters

-o or --output: Specify the output path for the fixed point network model.
--num_process: Specify the number of processes to run simultaneously (default is 10 processes).
--memory_saving: Use memory-saving methods to gather statistics.
--quant_level: Choose the quantization level.

Usage

Optional modes include:
- L1: Quickly quantizes data using max-min comparison, faster speed.
- L2: Quickly quantizes weight data using comparison.
- L3: Further analyzes statistics, approximating the original data distribution.
- L4: Approximates the weight data distribution, suggesting upgrading certain convolutions to 16-bit quantization.
- L5: Uses high precision data analysis methods to maximize fitting of the original data distribution and suggest upgrading certain convolutions to 16-bit quantization.
- The default is L5 quantization level; the higher the level, the higher the quantization accuracy, but the quantization speed will correspondingly slow down.
Choosing L3, L4, or L5 will automatically configure the quantization method for the convolutions based on the statistics. If a specific convolution quantization needs to be enforced, it can be configured in the input_config.ini file.

4.2.2 Customizing Calibrator¶

1. Tool Names: calibrator_custom.calibrator and calibrator_custom.SIM_Calibrator

2. Tool Function: Provides a more convenient and flexible way to convert and quantize multi-input, multi-segment networks.

3. Detailed Usage of the calibrator_custom.calibrator Tool:

The current calibrator_custom.calibrator tool provides a complete set of API interfaces. Developers can directly call these standardized interfaces, flexibly choose different modules based on actual needs, and quickly build custom conversion models. Detailed instructions on the specific interface calling methods are as follows: First, create an instance of calibrator using the calibrator_custom.calibrator tool:

import calibrator_custom
calibrator_custom.set_soc_version('CHIP')                                # Set the CHIP for this conversion
model_path = './onnx_yolov8s_float.sim'                                  # Floating point model path
input_config_path = './input_config.ini'                                 # INI file path
calibrator = calibrator_custom.calibrator(model_path, input_config_path) # Create instance

Usage

calibrator_custom.set_soc_version can only be called once to set the chip information for the conversion.
Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

After instantiation, you can call the supported API interfaces of the calibrator_custom.calibrator tool. The available API interfaces include:

(1) get_input_details

Functionality: Gets detailed information about the network model and returns it in a list format.

Example:

input_details = calibrator.get_input_details()
>>> print(input_details)
[{'index': 0, 'shape': array([ 1, 513, 513, 3], dtype=int32), 'name': 'sub_7', 'dtype': <class 'numpy.float32'>}]

(2) get_output_details

Functionality: Gets detailed information about the network model and returns it in a list format.

Example:

output_details = calibrator.get_output_details()
>>> print(output_details)
[{'index': 0, 'shape': array([ 1, 257, 257, 30], dtype=int32), 'name': 'MobilenetV2/Conv/Conv2D', 'dtype': <class 'numpy.float32'>}]

(3) set_input

Functionality: Sets the input data for the network model.

Example:
```
calibrator.set_input(0, img_data) # args: index, data
```
The data comes from the return value of get_input_details(), in numpy.ndarray format. Multiple calls to set_input are needed for multi-input models.
(4) invoke

Functionality: Runs the model once.

Example:
```
calibrator.invoke()
```
You need to set the input data using set_input before calling invoke.
(5) get_output

Functionality: Retrieves the output data from the model.

Example:
```
result = calibrator.get_output(0) # args: index
```
Multiple calls to get_output are needed for multi-output models.

(6) get_tensor_details

Functionality: Returns information about each tensor in the network model in a list format.

Example:

tensor_details = calibrator.get_tensor_details()
>>> print(tensor_details)
[{'dtype': 'FLOAT32', 'name': 'MobilenetV2/Conv/Conv2D', 'qtype': 'INT16', 'shape': array([ 1, 257, 257, 30], dtype=int32)},
{'dtype': 'FLOAT32', 'name': 'MobilenetV2/Conv/Conv2D_bias', 'qtype': 'INT16', 'shape': array([ 2, 30], dtype=int32)},
{'dtype': 'FLOAT32', 'name': 'MobilenetV2/Conv/weights/read', 'qtype': 'INT8', 'shape': array([30, 3, 3, 3], dtype=int32)},
{'dtype': 'FLOAT32', 'name': 'sub_7', 'qtype': 'UINT8', 'shape': array([ 1, 513, 513, 3], dtype=int32)}]

4. Rules for Importing Quantization Parameters with calibrator_custom.calibrator:

cuscali0

Compared to the regular calibrator, it requires additionally importing a quantization parameter file in JSON or Pickle format. A more detailed flowchart for importing quantization data is shown below:

cuscali1

Usage

① In order to maintain compatibility with the existing quantization process, the strategies used for previous quantizations are still applied.

② To obtain Float model tensor information: After converting the original model to an SGS_Float model, some operators may be merged and optimized, resulting in a significant difference in the computation graph compared to the original framework model. The content of the provided quantization file needs to be specifically adjusted based on the converted SGS_Float model, such as updating tensor names. To facilitate the modification of merged and optimized layers, you can use the get_tensor_details method of calibrator_custom.calibrator to obtain the basic information of each tensor, including name, shape, dtype, and qtype information.

③ To obtain Fixed model tensor information: The converted fixed model can use the get_tensor_details method of calibrator_custom.fixed_simulator to obtain basic tensor information, including the name, shape, min, max, quantization, and dtype information for each tensor.

④ After updating the original model's quantization file based on the tensor information, you can import the quantization data file.

⑤ After parsing the quantization file, existing quantization information will be matched with the imported quantization information. The imported quantization information will be prioritized, but it may be disregarded if found unreasonable during merging.

(1) Quantization Model Parameter Export Tool Introduction

The position of the quantization model parameter export tool is SGS_IPU_Toolchain/Scripts/examples/save_quant_param.py

Usage example:
```
python3 save_quant_param.py \
-m fixed.sim \
--output_mode {JSON,Pickle}
```
Usage
- The --output_mode parameter supports exporting quantization parameter data in JSON or Pickle format.

(2) Explanation of Quantization Parameter Contents
Tensor Name (name) [str]
Operator Type providing corresponding min, max values. [list]
Quantization Bit Depth. [int]

Constant Tensor Data (data) (optional) [numpy.ndarray]

[
    {
        "name": "FeatureExtractor/MobilenetV2/Conv2d_0/weights",
        "min": [-4.555312, -2.876907, -1.234419],
        "max": [7.364561, 3.960804, 6.0],
        "bit": 8
    },
    {...},
    ...
]

(3) Explanation of Quantization Rules
- Quantization Method for Conv2D
  
  ① Conv2D quantization includes Inputs, Weights, and Outputs, currently supporting both 8-bit and 16-bit quantization.
  
  ② For 8-bit quantization, the input for Conv2D is UINT8 (equivalent to INT9's representation capability), while Weights are INT8; for 16-bit quantization, both the input and Weights for Conv2D are INT16.
  
  ③ The number of min and max values for Weights is determined by the number of kernels, while the number of min and max values for Inputs and Outputs is determined by the C dimension of the inputs and outputs.
  
  ④ Based on the statistics obtained for Inputs, Weights, and Outputs, Weights are first quantized to fixed data stored internally in the fixed network, and then input and output data are dynamically quantized during network operation.
- Quantization Method for DepthwiseConv2D
  
  ① DepthwiseConv2D quantization includes Inputs, Weights, and Outputs, currently supporting both 8-bit and 16-bit quantization.
  
  ② For 8-bit quantization, the input for DepthwiseConv2D is UINT8 (equivalent to INT9's representation capability), while Weights are INT8; for 16-bit quantization, both the input and Weights for DepthwiseConv2D are INT16.
  
  ③ The number of min and max values for Weights, Inputs, and Outputs is determined by the C dimension of the corresponding tensor.
  
  ④ Based on the statistics obtained for Inputs, Weights, and Outputs, Weights are first quantized to fixed data stored internally in the fixed network, and then input and output data are dynamically quantized during network operation.
- Quantization Method for Other Operators
  
  ① The min and max values for quantization of other operators in the network are based on the C dimension count and only support 16-bit quantization.
  
  ② You can determine the data type of the Tensor during fixed-point modeling from the qtype by calling the get_tensor_details method of the calibrator_custom.calibrator.

4. Detailed Usage of calibrator_custom.SIM_Calibrator Tool:

For simultaneous conversion of multi-input, multi-segment networks, the calibrator_custom.SIM_Calibrator is provided to facilitate uniform transformations after simple definitions. The calibrator_custom.

SIM_Calibrator is a pre-implemented class with only the forward method not implemented, and you simply need to implement this method to complete the conversion.

Here is an example using SGS_IPU_Toolchain/Scripts/examples/sim_calibrator.py to illustrate how to use the calibrator_custom.SIM_Calibrator:

① Define the forward method:

import calibrator_custom
class Net(calibrator_custom.SIM_Calibrator):
    def __init__(self):
        super().__init__()
        self.model = calibrator_custom.calibrator(model_path, input_config)
    def forward(self, x):
        out_details = self.model.get_output_details()
        self.model.set_input(0, x)
        self.model.invoke()
        result_list = []
        for idx in range(len(out_details)):
            result = self.model.get_output(idx)
            result_list.append(result)
        return result_list

Usage

The parameter for forward is the model input; you can add parameters to forward if there are multiple inputs.

② Create an instance of calibrator_custom.SIM_Calibrator:
```
net = Net()
```

③ Use the predefined preprocessing method from calibrator_custom.utils.image_preprocess_func to obtain img_gen:

preprocess_func = calibrator_custom.utils.image_preprocess_func(model_name)

def image_generator(folder_path, preprocess_func):
    images = [os.path.join(folder_path, img) for img in os.listdir(folder_path)]
    for image in images:
        img = preprocess_func(image)
        yield [img]
img_gen = image_generator('./images', preprocess_func)

④ Call the convert method of calibrator_custom.SIM_Calibrator:
```
net.convert(img_gen, fix_model=[out_model_path])
```
Usage
- - Image generator (img_gen): Generated in step ③, it conveniently organizes the sequence of input images for multi-input and multi-segment network conversion models. If the model has multiple inputs, the generator should return a list of multiple numpy.ndarray according to the defined input order in the forward method.
  The convert method requires two parameters: the image generator and a list of save paths for the fixed.sim model.
  - Fixed.sim model save path list: If you define multiple models in init, the save path list for fixed models should be named according to the order of models defined in init.
- Other optional parameters for the convert method:
  - num_process: Number of processes, specifying the number of CPUs to run simultaneously.
  - quant_level: Choose the quantization level: [L1, L2, L3, L4, L5], with the default being L5 quantization level.
  - quant_param: Import quantization parameters. If there are existing quantization parameters for the corresponding model, you can import them during model conversion.

4.2.2.1 Custom Calibrator - Example of Segmenting Network Conversion¶

If the network contains unsupported layers, you can split the complete network into segments:

After running the first segment of the network, the output is fed into a custom implementation layer,

and then the output of the custom layer is used as the input for the second segment of the network.

Taking the Faster_RCNN network as an example:

The Proposal Layer in the Faster_RCNN network is unsupported, so the network needs to be split at this layer.

1) Network Splitting

Network splitting diagram:

box_process

(1) The Python Layer in the diagram represents the Proposal Layer. Modify the prototxt file to split the network at the Proposal Layer.

(2) The first segment of the network will have 3 outputs, where rpn_cls_prob_reshape and rpn_bbox_pred outputs will serve as inputs to the Proposal Layer, while conv5_3 and the output of the Proposal Layer will serve as inputs to the second segment of the network.

(3) The input of the first segment of the network is image data.

(4) The input for the second segment of the network is not image data; both inputs can be configured to use RAWDATA_S16_NHWC format. Once configured, convert the two networks into float.sim model files using ConvertTool.py.

Specific

First Segment Network:

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/ConvertTool.py caffe \
--model_file test_stageone.prototxt \
--weight_file VGG16_faster_rcnn_final.caffemodel \
--input_arrays data \
--output_arrays rpn_cls_prob_reshape,rpn_bbox_pred,conv5_3 \
--input_config input_config.ini \
--output_file faster_rcnn_main_float.sim

[INPUT_CONFIG]
inputs=data;
input_formats=BGR;
quantizations=TRUE;
mean=122.7717:115.9465:102.9801;
std_value=1;
[OUTPUT_CONFIG]
outputs=rpn_cls_prob_reshape,rpn_bbox_pred,conv5_3;
dequantizations=TRUE,TRUE,FALSE;

The dequantization for the output conv5_3 is set to FALSE since this output is directly input to the second segment of the network, which means there is no need to convert it to float during board operation; it can be directly fed into the second segment of the network.

Second Segment Network: The rois input dimension for ROIPooling is (N×5). When the later network consists entirely of InnerProduct, N can be set to 300 (as shown in the diagram above). If there are convolutional layers in the later network, N can only be set to 1, and the second segment must be executed N times in a loop.

python3 ~/SGS_IPU_Toolchain/Scripts/ConvertTool/ConvertTool.py caffe \
--model_file second_stage.prototxt \
--weight_file VGG16_faster_rcnn_final.caffemodel \
--input_arrays conv5_3,rois \
--output_arrays cls_prob,bbox_pred \
--input_config input_config_stage2.ini \
--output_file faster_rcnn_stage2_float.sim

[INPUT_CONFIG]
inputs=conv5_3,rois
input_formats=RAWDATA_F32_NHWC,RAWDATA_F32_NHWC;
quantizations=TRUE,TRUE;
[OUTPUT_CONFIG]
outputs=cls_prob,bbox_pred;
dequantizations=TRUE,TRUE;

2) Converting the Networks

The path SGS_IPU_Toolchain/Scripts/examples/caffe_faster_rcnn/faster_rcnn_calibrator.py contains a demo tool for converting the Faster_RCNN network; simply run this to convert both segments into fixed models.

(1) Command

python3 ~/SGS_IPU_Toolchain/Scripts/examples/caffe_faster_rcnn/faster_rcnn_calibrator.py \
-i ~/SGS_Models/resource/detection/voc_calibration_set32/ \
-m0 faster_rcnn_main_float.sim \
-m1 faster_rcnn_stage2_float.sim \
--input_config0 input_config.ini \
--input_config1 input_config_stage2.ini

(2) Converting the Segmented Network

First, define the two segments of the network, and organize the execution of both network segments within the forward method:

class Net(calibrator_custom.SIM_Calibrator):
    def __init__(self, main_model_path, main_input_config, second_model_path, second_input_config):
        super().__init__()
        self.main_model = calibrator_custom.calibrator(main_model_path, main_input_config)
        self.second_model = calibrator_custom.calibrator(second_model_path, second_input_config)
        self.rpn = rpn.ProposalLayer()
    def forward(self, x):
        out_details = self.main_model.get_output_details()
        input_data, im_scale = fill_inputImg2main(x)
        self.main_model.set_input(0, input_data)
        self.main_model.invoke()
        result_list = []
        for idx, _ in enumerate(out_details):
            result = self.main_model.get_output(idx)
            result_list.append(result)
        im_info = np.array([x.shape[0], x.shape[1], im_scale]).reshape(1, 3)
        bottom = [result_list[0], result_list[1], im_info]
        roi = self.rpn.forward(bottom)
        out2_details = self.second_model.get_output_details()
        self.second_model.set_input(0, result_list[2])
        self.second_model.set_input(1, roi)
        self.second_model.invoke()
        second_result = []
        for idx, _ in enumerate(out2_details):
            result = self.second_model.get_output(idx)
            second_result.append(result)
        return second_result

In the Faster_RCNN network, the custom rpn layer is used, so the first segment's outputs (the first and second) and the image information are used as inputs to the rpn layer, while the third output of the first segment and the output roi of the rpn layer are used as inputs to the second segment of the network.
Finally, invoke the convert method of Net to generate the fixed models for both segments.

net = Net()
net.convert(img_gen, num_process=num_subsets, fix_model=[out_main_model, out_second_model])

3) Inference Network

The path SGS_IPU_Toolchain/Scripts/examples/caffe_faster_rcnn/faster_rcnn_simulator.py contains a demo tool for running the Faster_RCNN network.

(1) Command

python3 ~/SGS_IPU_Toolchain/Scripts/examples/caffe_faster_rcnn/faster_rcnn_simulator.py \
-i ~/SGS_Models/resource/detection/004545.jpg \
-m0 faster_rcnn_main_float.sim \
-m1 faster_rcnn_stage2_float.sim \
-t Float

(2) The method to run both segments of the model is similar to that of converting the networks.

class Net(calibrator_custom.SIM_Simulator):
    def __init__(self, main_model_path, second_model_path, phase):
        super().__init__()
        if phase == 'Float':
            self.main_model = calibrator_custom.float_simulator(main_model_path)
            self.second_model = calibrator_custom.float_simulator(second_model_path)
            self.norm = True
        elif phase == 'Fixed':
            self.main_model = calibrator_custom.fixed_simulator(main_model_path)
            self.second_model = calibrator_custom.fixed_simulator(second_model_path)
            self.norm = False
        else:
            self.main_model = calibrator_custom.offline_simulator(main_model_path)
            self.second_model = calibrator_custom.offline_simulator(second_model_path)
            self.norm = False
        self.rpn = rpn.ProposalLayer()
    def forward(self, x):
        # Run main model
        out_details = self.main_model.get_output_details()
        input_data, im_scale = fill_inputImg2main(x, norm=norm)
        self.main_model.set_input(0, input_data)
        self.main_model.invoke()
        def forward(self, x):
            out_details = self.main_model.get_output_details()
            input_data, im_scale = fill_inputImg2main(x)
            self.main_model.set_input(0, input_data)
            self.main_model.invoke()
            result_list = []
            for idx, _ in enumerate(out_details):
                result = self.main_model.get_output(idx)
                result_list.append(result)
            im_info = np.array([x.shape[0], x.shape[1], im_scale]).reshape(1, 3)
            bottom = [result_list[0], result_list[1], im_info]
            roi = self.rpn.forward(bottom)
            out2_details = self.second_model.get_output_details()
            self.second_model.set_input(0, result_list[2])
            if self.norm:
                self.second_model.set_input(1, roi)
            else:
                self.second_model.set_input(1, roi)
            self.second_model.invoke()
            second_result = []
            for idx, _ in enumerate(out2_details):
                result = self.second_model.get_output(idx)
                second_result.append(result)
            return second_result

When running the network, first create an instance of Net, then call its method to execute.

net = Net()
results = net(img_gen, num_process=num_subsets)

4.2.3 Detailed Explanation of the torch_calibrator Tool¶

4_3

1. Tool Location: SGS_IPU_Toolchain/Scripts/calibrator/torch_calibrator.py.

2. Tool Function: Supplementary to the calibrator tool. Suitable for the following scenarios:

When the model parameters are relatively small, and the calibrator tool does not achieve satisfactory accuracy.
When the model parameters are large, with expectations of using 4-bit quantization to reduce model size.

3. Advantages of the Tool:

The torch_calibrator is a new generation quantization tool developed based on PyTorch, with the following features compared to the old toolchain:

Supports GPU usage; quantization speed is faster in a GPU environment for algorithms at the same level (requires the use of Nvidia® GPU Docker);
Supports 4/8-bit mixed quantization algorithms, which allows compressing models further under certain conditions based on 8-bit quantization, enhancing operational efficiency;
Supports more complex quantization calibration algorithms, leading to further improvements in quantization accuracy, particularly effective for small models (e.g., MobileNet series, ShuffleNet series, NanoDet series, etc.).

4. Usage Example:

python3 torch_calibrator.py \
-i ilsvrc2012_calibration_set100 \
-m caffe_mobilenet_v2_float.sim \
--input_config input_config.ini \
--quant_config quant_config.yaml \
-n caffe_mobilenet_v2.py \
-q Q2 \
--soc_version CHIP

5. Parameter Description:

(1) Required Parameters

-i, --image: Path to the calibration dataset images, typically 100 training images.
-m, --model: Path to the floating point network model file.
--input_config: Path to the input_config.ini file.
--quant_config: Path to the quantization parameters (in YAML format) file; this file's configuration method and usage will be detailed later.
-n, --preprocess: Path to the model preprocessing file.
-q, --q_mode: Quantization option, which will be described in detail later.
--soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters

-o, --output: Path for model output.
--cal_batchsize: Batch size for the calibration dataset, generally equal to the number of samples in the dataset.

Usage

If the calibration dataset is extremely large or the model structure is unique such that inference cannot proceed with batchsize ≠ 1, you can set this parameter. The default is 100.

Detailed Explanation of the Quantization Parameter Configuration File quant_config.yaml

Compared to the calibrator tool, the torch_calibrator tool requires configuring a quantization parameters (in YAML format) file.

4_4

File content display:

QUANT_CONFIG:
  retrain_iter_num: 40
  device: 'cuda:0'
  mixed_precisions: [4, 8]
  use_preset_min_max_bit: 0
  mp_rate: 0.6

PRESET_MIN_MAX_BIT: # preset tensor min/max and bit
  [
    {name: "126", min: null, max: null, bit: 8},
    {name: "layer1.0.conv1.weight", min: null, max: null, bit: 8}
  ]

File Content Description

retrain_iter_num: Number of retraining iterations for each layer

If this item is not specified, the default is 40; some lightweight models may require more iterations.
device: Specify GPU/CPU

If not specified, it will automatically search for idle GPUs. If no GPU is available, it will default to using the CPU.

To specify CPU computations, simply enter 'cpu'.
use_preset_min_max_bit: Whether to use manually preset min/max/bit information

If this item is set to 1, it will use the manually set information below; if set to 0 or not specified, the information below will not be used.
mixed_precisions: Mixed quantization mode

Optional values are [4, 8] or [8, 16].
If not filled, in Q13 mode the default is [4, 8]; in Q23 mode the default is [8, 16].
mp_rate: Compression rate specified for mixed quantization

When mixed_precisions is set to [4, 8], mp_rate should be specified within the range [0.25, 0.5) (excluding 0.5).
When mixed_precisions is set to [8, 16], mp_rate should be specified within the range [0.5, 1].
If this parameter is not specified, an automatic compression rate recommendation will be made. This parameter, in conjunction with mixed_precisions, will generate a model with 4/8 bits or 8/16 bits. It only takes effect in Q13 and Q23 modes.

It is recommended to use Q13 or Q23 without initially configuring mp_rate, allowing torch_calibrator to automatically recommend a compression rate after execution.

If the automatically recommended compression rate meets accuracy requirements, you can adjust the compression rate downwards in increments of 0.05 to enhance model performance while meeting accuracy requirements.

If the automatically recommended compression rate does not meet accuracy requirements, you can adjust the compression rate upwards in increments of 0.05 to improve model quantization accuracy.
PRESET_MIN_MAX_BIT: Allows presetting of specific tensor min/max/bit information

Tensor names should conform to those in the float.sim model. The use_preset_min_max_bit must be set to 1.

Detailed Explanation of torch_calibrator Quantization Parameters -q / --q_mode

Quantization Level	Quantization Features	Quantization Suboptions	Detailed Description
Q1	Only performs min and max calibration, does not optimize convolution parameters, overall speed is faster; suitable for environments without a GPU, but if there is a GPU, it can accelerate calibration speed; accuracy is generally on par with the precision of calibrator quantization.	Q10	Full 16-bit quantization, accuracy is basically consistent with the float32 model
		Q1/Q11	Full 8-bit fast quantization, accuracy is comparable to L2 mode
		Q12	Adaptive 8-bit and 16-bit quantization, accuracy is comparable to L5 mode
		Q13	8/16-bit (4/8-bit can be modified according to the configuration file, default is 8/16-bit) mixed quantization, suitable for models needing further compression.
Q2	The quantization level will optimize the model layer by layer; significant accuracy improvements at low bit levels, especially effective for lightweight models; overall accuracy surpasses that of Q1.	Q20	Full 16-bit quantization, accuracy is basically consistent with the float32 model
		Q21	Full 8-bit quantization; better results observable in some small models
		Q2/Q22	Adaptive selection of quantization method, currently the best-performing quantization approach, generating fixed point models with full 8-bit quantization or close to full 8 of 8/16 mixed quantization.
		Q23	8/16-bit (4/8-bit can be modified according to the configuration file, default is 8/16-bit) mixed quantization, can further compress the model, with both compression level and accuracy superior to Q13.

4.3. Stage Three: Detailed Explanation of Using the Compiler Tool to Generate Offline Network Models¶

4_5

This section introduces how to use the Compiler tool to convert SGS fixed point models into SGS edge models.

1. Tool Location: SGS_IPU_Toolchain/Scripts/calibrator/compiler.py.

2. Tool Function: Converts the models to SGS edge models.

3. Example of Usage:Taking the conversion of an ONNX framework model as an example:

python3 ~/SGS_IPU_Toolchain/Scripts/calibrator/compiler.py
-m ~/SGS_Models/onnx/onnx_yolov8s/onnx_yolov8s_fixed.sim
--soc_version CHIP

4. Parameter Description:

(1) Required Parameters

① -m, --model: Path to the fixed network model file.
② --soc_version: IPU Toolchain chip

Usage

Execute python3 SGS_IPU_Toolchain/DumpDebug/show_sdk_info.py to view which chips and version information IPU Toolchain specifically supports.

(2) Optional Parameters:

① -b, --batch: Expected number of batches, default is 1.
② --batch_mode: Multi-batch mode.

Usage

Optional values include n_buf/one_buf, with the default being n_buf mode.
n_buf: Each batch has independent buffers for input and output data.
- When selecting this mode, the -b/--batch parameter specifies the expected maximum batch size; the Compiler will optimize based on this maximum batch size, and the generated offline model can run with any size less than the maximum batch size.
one_buf: Multiple batches’ data must be stacked in the highest dimension and use a single buffer.
- When selecting this mode, the -b/--batch parameter allows specifying multiple batch sizes to use, separated by commas (,), with a maximum of 10. The generated offline model can only run with the specified batch sizes.

③ -c, --category: Category of the model.

Usage

Optional values are Classification / Detection / Unknown.
- Classification: The model has 1 output and will output the top 5 scores sorted from high to low based on outputs.
- Detection: The model has 4 outputs and will convert outputs to bounding box (bbox) positions and categories for the input image. Only supports IPU post-processing operators [post-processing module]. Other post-processing should use Unknown.
- Unknown: The model output does not belong to the above two categories and will output all tensor values. The default for fixed network model conversion to offline network model is Unknown.

④ -o, --output: Path for model output.

Usage

Specify the location for the offline network model output data: if a folder is specified, it will automatically name the file with the network model file prefix followed by sgsimg.img;
If a specific path and filename are specified, the offline network model will be named accordingly;
If this parameter is not specified, the offline network model will be stored at the network model file path.