IPU User Guide


1. IPU


1.1. Introduction

IPU (Intelligence Processing Unit)

The MI IPU module accelerates the deduction function of the AI model. The use of IPU includes the training, conversion and simulation of the model by the PC-side IPU SDK and the calling and calculation of the model by the chip-side MI IPU.

For the use of IPU SDK, please refer to the documents in the IPU SDK directory. For the introduction and use of MI IPU related API, please refer to the MI IPU API. This article mainly introduces the commonly used debugging methods of IPU on the chip side and the use of SSTAR face/human algorithm demo.

*The Demo path mentioned in this article: sdk/verify/release_feature/source/dla/


1.2. MI_IPU chip side verification model is correct

The IPU model is simulated on the PC and the results are obtained. It is necessary to verify on the chip whether the model results match the simulation results. This can be verified through dla_simulator.

Example:

./prog_dla_dla_simulator -i ./480x800.jpeg -m ./sypfa5.480302_fixed.sim_sgsimg.img -c Unknown -f BGRA

Note: If the format is ARGB, the format should be set to BGRA. The ARGB storage order of the SSTAR platform is reversed. If the model is inconsistent with the standard classification or detection, -c needs to be filled in Unknown.

Running the log will print information such as tensor and model, as well as the estimated time consumption, to evaluate performance such as frame rate.

In the output directory, you can get the output Tensor fixed-point type output calculated by the board, which is similar to the result in the figure below. Compare it with the floating-point type result simulated by the PC to confirm whether the accuracy of the model is normal.

There may be some small errors when comparing fixed-point types and floating-point types, which is normal.

To verify the nbatch model, please use the dla_simulator_nbatch demo. The only difference from the dla_simulator demo is that there is an additional -n parameter to specify the maximum number of simulations at a time.


1.3. Performance analysis through IPU Log

Through the IPU Log generated by the board and the analysis tools provided by the IPU SDK, users can see the performance time consumption of each layer and optimize the algorithm model.

In MI IPU API document, there are some debug commands for the board-side IPU, including viewing and modifying the IPU clock, enabling IPU Log capture, etc. To generate the IPU Log, you need to use the command to enable IPU Log capture first, and then when using the IPU SDK analysis tool, you need to pass the current IPU clock as an input parameter, which can also be confirmed through commands.

For details on how to use the analysis tool, please read the explanation in the User Manual 7.11. IPU Log Performance Analysis Tool in the IPU SDK document . Here we only introduce the generation of IPU Log on the board side, using the dla_show_img_info.zip demo. The following are examples of its use:

Two files, xxx_log_core0.bin and xxx_log_corectrl0.bin, will be generated in the specified path.IPU log analysis: xxx_log_corectrl0.bin is optional and may not be generated depending on the specific model; however, this does not affect the analysis.

Then convert the two bin files into json files with analysis tools and open them with chrome browser to see the information.


1.4. MI_IPU usage statistics

MI_IPU is similar to CPU in that it executes instruction operations. However, unlike CPU, IPU does not have a system to perform hardware time-sharing multiplexing for IPU, so the core computing power is maximized when IPU operates . The concept of utilization is the proportion of the IPU core working time in a period of time. Since there is a gap of several microseconds between each calculation, which is mainly the execution time of the upper-level software, there is no condition of IPU utilization rate of 100%.

The following demo provides a usage statistics function, which can count how long the IPU core works in a period of time and calculate the percentage.

ipu_utilization.c

Compilation instructions:

aarch64-linux-gnu-gcc ipu_utilization.c -o Ipu_calculate

Run:

/mnt # ./Ipu_calculate
usage: ./Ipu_calculate -t time_interval How often to count and print, unit: seconds
/mnt # ./Ipu_calculate -t 1

Example results:

Because the current chip has only one IPU core, you only need to pay attention to the value of core0.

PS: How to view the current ipu clk: cat /proc/mi_modules/mi_ipu/debug_hal/freq


1.5. How to maximize IPU utilization

In some specific scenarios, such as testing PCB board temperature, power consumption and other data, it is required to maximize the IPU utilization rate.

Modify the dla_simulator demo code described in 1.2 as follows:

Just let IPU do Invoke continuously, and then use the demo described in 1.3. to count the current usage rate, you can see that it is close to 100%.


1.6. Learn more about IPU usage & Model generation

For more information about using the IPU, please refer to the IPU Online Documentation


2. IPU algorithm


Algorithm Algorithm library name. Function Detail link
Attribute recognition algorithms libsgsalgo_attr Attribute recognition algorithms include: face attribute recognition, facial expression recognition, and vehicle attribute recognition. Attribute Recognition
Detection Algorithm libsgsalgo_det The mainly specific detection labels are described as follows:Face detection(SYFL)、Fire and smoke detection、Person + vehicle + pet + head + face(SD) Detection Algorithm
Detection and Tracking Algorithm libsgsalgo_dt This algorithm library can implement the following basic functions: human detection, face detection, vehicle detection, non-motorized vehicle detection, pet detection, and tracking of detected targets (human, face, vehicle, non-motorized vehicle, pets, etc.) Detection and Tracking Algorithm
Face Recognition Algorithm libsgsalgo_fr Face recognition primarily distinguishes between human faces and determines whether a captured face belongs to a whitelisted individual. The entire algorithm includes face detection, face attribute recognition, facial expression recognition, facial landmark detection, face filtering, face tracking, face alignment, feature extraction, and face comparison. Face Recognition Algorithm
License Plate Recognition Algorithm libsgsalgo_lpr License Plate Recognition(LPR) includes the following functions: license plate detection, license plate number recognition, license plate color recognition, license plate type recognition License Plate Recognition Algorithm
Retrieval Text-to-Image Search Algorithm Description libsgsalgo_ret Text-to-image search (Retrieval, Ret) can achieve the function of retrieving image content using text and can be used in conjunction with detection + tracking algorithms to implement the ability to search for attributes of detected targets. Retrieval Text-to-Image Search Algorithm Description
KWS Algorithm libsgsalgo_kws Voice Wake-up (Keywords Spotting, KWS) is an algorithm for detecting whether a specified wake word exists in the audio stream KWS Algorithm
SED Algorithm libsgsalgo_sed Sound Event Detection (SED) is an algorithm for detecting the presence of specific sound events, currently supporting detection of children's crying, coughing, and glass breaking Sound Event Detection Algorithm