Skip to content

Image Classification

1 Overview

1.1 Background Introduction

Deep learning has achieved tremendous success in image classification tasks, with many classic neural network architectures (such as ResNet50, MobileNetV2, MobileOne, etc.) widely applied in various practical scenarios. In this open-source classification algorithm list, we have selected three classic models: MobileNetV2, MobileOne, and MobileVit as examples.

For details on MobileNetV2, you can visit the official link:

https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md

For MobileVit, due to limited reference information, you can directly refer to the paper:

https://arxiv.org/pdf/2110.02178mobileone

For details on MobileOne, please visit the official link:

https://github.com/apple/ml-mobileone

The model download links are as follows:

[mobilenetv2](https://download.pytorch.org/models/mobilenet_v2-b0353104.pth), [mobilevit-s](https://docs-assets.developer.apple.com/ml-research/models/cvnets/classification/mobilevit_s.pt), [mobileone-s](https://docs-assets.developer.apple.com/ml-research/datasets/mobileone/mobileone_s1_unfused.pth.tar)

1.2 Usage Instructions

The Linux SDK-alkaid includes pre-converted offline models and board examples by default. The relevant file paths are as follows:

  • Board example program path:

    Linux_SDK/sdk/verify/opendla/source/classification
    
  • Board offline model paths:

    Linux_SDK/project/board/${chip}/dla_file/ipu_open_models/classification/mobilenet_v2_224.img
    Linux_SDK/project/board/${chip}/dla_file/ipu_open_models/classification/mobilevit_s_256.img
    Linux_SDK/project/board/${chip}/dla_file/ipu_open_models/classification/mobileone_s1_224.img
    
  • Board test image path:

    Linux_SDK/sdk/verify/opendla/source/resource/apple.jpg
    

If users do not need to convert models, they can directly jump to section 3.

2 Model Conversion

2.1 ONNX Model Conversion

The model conversion process for classification algorithms is the same; the following will detail the process using MobileNetV2 as an example:

  • Python environment setup:

    $conda create -n classification python==3.10
    $conda activate classification
    $git clone https://github.com/WZMIAOMIAO/deep-learning-for-image-processing.git
    $cd deep-learning-for-image-processing/pytorch_classification/Test6_mobilenet
    

    Note: The provided Python environment setup is for reference only; please refer to the official source code running tutorial for the specific setup process: https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master

  • Model Testing:

    • Create the opendla directory and place the downloaded model there, then write the model testing script infer.py:

      import os
      import json
      
      import torch
      from PIL import Image
      from torchvision import transforms
      import matplotlib.pyplot as plt
      
      from model_v2 import MobileNetV2
      
      def main():
          device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
      
          data_transform = transforms.Compose(
              [transforms.Resize(256),
              transforms.CenterCrop(224),
              transforms.ToTensor(),
              transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
      
          # load image
          img_path = "./apple.jpg"
          assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
          img = Image.open(img_path)
          plt.imshow(img)
          # [N, C, H, W]
          img = data_transform(img)
          # expand batch dimension
          img = torch.unsqueeze(img, dim=0)
      
          # create model
          model = MobileNetV2(num_classes=5).to(device)
          # load model weights
          model_weight_path = "./opendla/mobilenet_v2-b0353104.pth"
          model.load_state_dict(torch.load(model_weight_path, map_location=device))
          model.eval()
          with torch.no_grad():
              # predict class
              output = torch.squeeze(model(img.to(device))).cpu()
              predict = torch.softmax(output, dim=0)
              predict_cla = torch.argmax(predict).numpy()
              print("class id", predict_cla)
      
      if __name__ == '__main__':
          main()
      
    • Run the model testing script to ensure the classification environment is configured correctly: $python infer.py

  • Model Export:

    • Install the required packages:

      $pip install onnx -i https://pypi.tuna.tsinghua.edu.cn/simple
      $pip install onnx-simplifier -i https://pypi.tuna.tsinghua.edu.cn/simple
      
    • Write the model conversion script export.py:

      import os
      import json
      
      import torch
      from PIL import Image
      from torchvision import transforms
      import matplotlib.pyplot as plt
      
      from model_v2 import MobileNetV2
      
      def main():
          device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
      
          data_transform = transforms.Compose(
              [transforms.Resize(256),
              transforms.CenterCrop(224),
              transforms.ToTensor(),
              transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
      
          # load image
          img_path = "./apple.jpg"
          assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
          img = Image.open(img_path)
          plt.imshow(img)
          # [N, C, H, W]
          img = data_transform(img)
          # expand batch dimension
          img = torch.unsqueeze(img, dim=0)
      
          # create model
          model = MobileNetV2(num_classes=5).to(device)
          # load model weights
          model_weight_path = "./opendla/mobilenet_v2-b0353104.pth"
          model.load_state_dict(torch.load(model_weight_path, map_location=device))
          model.eval()
          torch.onnx.export(
              model,
              img.to(device),
              "./opendla/mobilenetv2.onnx",
              opset_version=13,
              input_names=['images'],
              output_names=['output'],
              do_constant_folding=False
          )
      
      if __name__ == '__main__':
          main()
      
    • Run the model conversion script; it will generate the mobilenetv2.onnx model in the opendla directory: $python export.py

    • Optimize the graph structure: $python -m onnxsim opendla/mobilenetv2.onnx opendla/mobilenetv2_sim.onnx

2.2 Offline Model Conversion

2.2.1 Pre & Post-Processing Instructions

  • Pre-processing

    The input information for the successfully converted mobilenetv2.onnx model is shown in the figure below. The required image size is (1, 3, 224, 224). Additionally, the pixel values need to be normalized to the range [0, 1].

  • Post-processing

    The classification model does not have post-processing; after obtaining the model output, the final classification result can be obtained through softmax and argmax processing. The output information for the successfully converted mobilenetv2_sim.onnx model is shown in the figure below.

2.2.2 Offline Model Conversion Process

Note: 1) OpenDLAModel corresponds to the smodel files extracted from the compressed package image-dev_model_convert.tar. 2) The conversion command needs to be run in a Docker environment; please load the SGS Docker environment according to the Docker Development Environment Tutorial.

  • Copy the ONNX model to the conversion code directory:

    $cp opendla/mobilenetv2_sim.onnx OpenDLAModel/classification/onnx
    
  • Conversion command:

    $cd IPU_SDK_Release/docker
    $bash run_docker.sh
    # Enter the OpenDLAModel directory in the Docker environment
    $cd /work/SGS_XXX/OpenDLAModel
    $bash convert.sh -a classification/mobilenetv2 -c config/classification.cfg -p SGS_IPU_Toolchain(absolute path) -s false
    
  • Final generated model locations:

    output/${chip}_${time}/mobilenet_v2_224.img
    output/${chip}_${time}/mobilenet_v2_224_fixed.sim
    output/${chip}_${time}/mobilenet_v2_224_float.sim
    

2.2.3 Key Script Parameter Analysis

  • input_config.ini [INPUT_CONFIG] inputs = images; # ONNX input node name; separate multiple names with commas if necessary; training_input_formats = RGB; # Input format during model training, usually RGB; input_formats = BGRA; # Board input format; can choose BGRA or YUV_NV12 based on the situation; quantizations = TRUE; # Enable input quantization; do not modify; mean_red = 123.68; # Mean value related to model pre-processing; configure based on the actual situation; mean_green = 116.28; # Mean value related to model pre-processing; configure based on the actual situation; mean_blue = 103.53; # Mean value related to model pre-processing; configure based on the actual situation; std_value = 58.395:57.12:57.375; # Variance related to model pre-processing; configure based on the actual situation; [OUTPUT_CONFIG] outputs = output; # ONNX output node name; separate multiple names with commas if necessary; dequantizations = FALSE; # Whether to enable dequantization; fill according to actual needs; recommended to be TRUE. Set to False for int16 output; set to True for float32 output.

  • classification.cfg [CLASSIFICATION] CHIP_LIST=pcupid # Platform name; must match the board platform, otherwise the model cannot run Model_LIST=mobilenetv2_sim # Input ONNX model name INPUT_SIZE_LIST=224x224 # Model input resolution INPUT_INI_LIST=input_config.ini # Configuration file CLASS_NUM_LIST=0 # Just fill in 0 SAVE_NAME_LIST=mobilenet_v2_224.img # Output model name QUANT_DATA_PATH=quant_data # Quantized image path

2.3 Model Simulation

  • Obtain float/fixed/offline model output:

    $bash convert.sh -a classification/mobilenetv2 -c config/classification.cfg -p SGS_IPU_Toolchain(absolute path) -s true
    

    After executing the above command, the output tensor of the float model will be saved to a txt file in the classification/mobilenetv2/log/output path by default. Additionally, the classification/mobilenetv2/convert.sh script also provides simulation examples for fixed and offline; users can uncomment the code blocks to obtain the outputs for fixed and offline models respectively.

  • Model Accuracy Comparison

    Under the condition that the input remains the same as above, enter the environment set up in 2.1. Add a print statement at line 40 of the deep-learning-for-image-processing/pytorch_classification/Test6_mobilenet/infer.py file:

    print(output)
    

    This will allow you to obtain the output tensor corresponding to the PyTorch model node, enabling you to compare it with the float, fixed, and offline models. Additionally, it should be noted that the output format of the original model is NCHW, while the formats of float/fixed/offline model outputs are NHWC.

3 Board Deployment

3.1 Program Compilation

Before compiling the example program for the board, you need to select the deconfig according to the board model (nand/nor/emmc, DDR model, etc.) for the SDK full-package compilation. For details, please refer to the Alkaid SDK SIGDOC "Development Environment Setup" document.

  • Compile the board classification example:

    $cd sdk/verify/opendla
    $make clean && make source/classification -j8
    
  • Final executable file address:

    sdk/verify/opendla/out/${AARCH}/app/prog_classification
    

3.2 Running Files

When running the program, the following files need to be copied to the board:

  • prog_classification
  • apple.jpg
  • mobilenet_v2_224.img

3.3 Running Instructions

  • Usage: ./prog_classification -i image -m model [-t threshold] (Execution command for the file)
  • Required Input:

    • image: Image file folder/path to a single image
    • model: Path to the offline model to be tested
  • Optional Input:

    • threshold: Detection threshold (0.0~1.0, default is 0.5)
  • Typical Output:

    ./prog_classification -i resource/retrieval_library/000000577364.jpg -m models/mobilenet_v2_224.img
    
        inputs: resource/retrieval_library/000000577364.jpg
        model path: models/mobilenet_v2_224.img
        threshold: 0.500000
        client [789] connected, module:ipu
        found 1 images!
        the input image: resource/000000577364.jpg
        fillbuffer processing...
        net input width: 224, net input height: 224
        num classes: 1000
    
        score: 0.776115
    
        model invoke time: 4.095000 ms
        post process time: 0.571000 ms
        class_id: 385
        ------shutdown IPU0------
        client [789] disconnected, module:ipu