Skip to content

TransformerLM

1 Overview

1.1 Background Introduction

TransformerLM is typically used as a scorer for the acoustic model in ASR, re-scoring the predicted results. This model comes from the FunASR open-source repository. For specific details, please refer to:

https://github.com/modelscope/FunASR/blob/v0.3.0/funasr/bin/lm_inference.py

The model download address is:

https://www.modelscope.cn/models/iic/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/files

1.2 Usage Instructions

The Linux SDK-alkaid comes with pre-converted offline models and board-side examples by default. The relevant file paths are as follows:

  • Board-side example program path Linux_SDK/sdk/verify/opendla/source/llm/transformerlm

  • Board-side offline model path Linux_SDK/project/board/${chip}/dla_file/ipu_open_models/llm/lm_100sim.img

  • Board-side test dictionary path Linux_SDK/sdk/verify/opendla/source/resource/units_asr_punc_lm.txt

If the user does not need to convert the model, they can directly skip to section 3.

2 Model Conversion

2.1 onnx Model Conversion

  • Setting up the Python environment

    $conda create -n funasr python==3.10
    $conda activate funasr
    $pip install "modelscope[audio_asr]" --upgrade -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
    $git clone https://github.com/alibaba/FunASR.git && cd FunASR
    $pip install --editable ./ -i https://mirrors.aliyun.com/pypi/simple/
    

    Note: The Python environment setup provided here is only for reference; please refer to the official source code running tutorial for specific setup processes:

    https://github.com/modelscope/FunASR/tree/v0.3.0?tab=readme-ov-file#installation
    
  • Model testing

    • Write the model testing script funasr/bin/predict.py

      from modelscope.pipelines import pipeline
      from modelscope.utils.constant import Tasks
      
      inference_pipline = pipeline(
          task=Tasks.language_score_prediction,
          model='./speech_transformer_lm_zh-cn',
          output_dir='./tmp/')
      
      rec_result = inference_pipline(text_in='hello 大 家 好 呀')
      print(rec_result)
      
    • Run the model testing script to ensure the funasr environment is configured correctly

      $ Place the model file speech_transformer_lm_zh-cn-common-vocab8404-pytorch in the current directory
      $ mv speech_transformer_lm_zh-cn-common-vocab8404-pytorch speech_transformer_lm_zh-cn
      $ python ./funasr/bin/predict.py
      
  • Model export

    • Write the model conversion script funasr/bin/lm_export.py:

      import os
      import sys
      sys.path.append(os.getcwd())
      import logging
      import argparse
      import torch
      import onnx
      import onnxsim
      import onnxruntime as ort
      
      from funasr.torch_utils.initialize import initialize
      from funasr.train.class_choices import ClassChoices
      
      from funasr.lm.espnet_model import ESPnetLanguageModel
      from funasr.lm.seq_rnn_lm import SequentialRNNLM
      from funasr.lm.transformer_lm import TransformerLM
      from funasr.lm.abs_model import AbsLM
      from funasr.modules.mask import subsequent_mask
      from funasr.modules.nets_utils import make_pad_mask
      import torch.nn.functional as F
      
      import numpy as np
      from funasr.tasks.lm import LMTask
      from funasr.torch_utils.forward_adaptor import ForwardAdaptor
      
      def load_vocab(vocab_path, extra_word_list=[]):
          n = len(extra_word_list)
          with open(vocab_path, encoding='utf-8') as vf:
              vocab = {word.strip(): i + n for i, word in enumerate(vf)}
      
          for i, word in enumerate(extra_word_list):
              vocab[word] = i
      
          return vocab
      
      def softmaxcrossentropy_c(  # type: ignore
          x, target, weight=None, reduction="None"
      ):
          input_shape = x.shape
          max_x = np.max(x, axis=1, keepdims=True).astype(np.float64)
          exp_x = np.exp(x - max_x)
          p = exp_x / np.sum(exp_x, axis=1, keepdims=True)
          inp = np.log(p)
      
          N = input_shape[0]
          neg_gather_element_input = np.zeros((N), dtype=x.dtype)
      
          inp = np.squeeze(inp)
          target = np.squeeze(target)
          for i in range(N):
              index = target[i]
              neg_gather_element_input[i] = -inp[i, index]
      
          return neg_gather_element_input
      
      if __name__ == '__main__':
          # 1. Build Model
          model, train_args = LMTask.build_model_from_file(
              './speech_transformer_lm_zh-cn/lm.yaml', './speech_transformer_lm_zh-cn/lm.pb', 'cpu')
          wrapped_model = ForwardAdaptor(model, "export")
      
          tmp_input = "梁家人出烟的电视剧有什么"
          token_dict = load_vocab('./speech_transformer_lm_zh-cn/tokens.txt')
          x_tmp = []
          for char in tmp_input:
              if char in token_dict.keys():
                  x_tmp.append(token_dict[char])
              else:
                  x_tmp.append(token_dict["<unk>"])
          x_tmp = torch.tensor([x_tmp], dtype=torch.int64)
          x_tmp = torch.cat([x_tmp,torch.zeros((1, 100 - x_tmp.shape[1]), dtype=int)], dim=1)
      
          wrapped_model.eval()
      
          onnx_path = './speech_transformer_lm_zh-cn/lm_1x100.onnx'
          torch.onnx.export(
                  wrapped_model,
                  (x_tmp),
                  onnx_path,
                  export_params=True,
                  opset_version=14,
                  input_names=["token"],
                  output_names=["probs", 'x_lengths'],
                  verbose=False,
              )
          model_onnx = onnx.load(onnx_path)  # load onnx model
          onnx.checker.check_model(model_onnx)  # check onnx model
          model_onnx, check = onnxsim.simplify(model_onnx)
          onnx.save(model_onnx, onnx_path.replace('lm_1x100','lm_1x100_sim'))
          ort_sess = ort.InferenceSession(onnx_path)
      
          input_length = 100
          input_list = [
                  "梁家人出烟的电视剧有什么"
              ]
          token_dict = load_vocab('./speech_transformer_lm_zh-cn/tokens.txt')
          count = 0
      
          for sentence in input_list:
              x = []
              for char in sentence:
                  if char in token_dict.keys():
                      x.append(token_dict[char])
                  else:
                      x.append(token_dict["<unk>"])
              src_len = len(x)
              x = torch.tensor([x], dtype=torch.int64)
              x = torch.cat([x,torch.zeros((1, 100 - x.shape[1]), dtype=int)], dim=1)
              np.save("./speech_transformer_lm_zh-cn/lmx100/lm_input_%d.npy"%(count), x)
              count+=1
              ort_inputs = {
                      ort_sess.get_inputs()[0].name: x.numpy(),
                  }
              ort_outs = ort_sess.run(None, ort_inputs)
              logits = ort_outs[0]
      
              print("logits: ", np.array(logits[0]).shape)
              print("logits: ", logits[0][0][:10])
      
              onnx_nll = softmaxcrossentropy_c(logits[0], ort_outs[1].flatten())
              print("onnx nll", onnx_nll)
      
              x_lens = torch.full((1,), fill_value=src_len, dtype=torch.int64)
              mask_ = np.zeros(input_length+1)
              mask_[np.arange(x_lens + 1)] = 1
      
              onnx_nll = np.array(onnx_nll * mask_)
              onnx_nll = np.sum(onnx_nll)
              onnx_nll = onnx_nll / (x_lens + 1)
              print("onnx_nll ", onnx_nll)
      
    • Run the model conversion script, which will generate the lm_1x100.onnx model in the ./speech_transformer_lm_zh-cn directory

      python ./funasr/bin/lm_export.py
      

2.2 Offline Model Conversion

2.2.1 Pre & Post Processing Instructions

  • Preprocessing The input to the language model is the output of the acoustic model, which corresponds to the index values in the dictionary. The input information for the successfully converted lm_sim.onnx model is shown in the image below:
  • Postprocessing The language model has no postprocessing operations; the scores output by the language model are typically multiplied by a coefficient and then weighted with the scores from the acoustic model output, which is then decoded using greedy search. The output information is shown below:

2.2.2 Offline Model Conversion Process

Note: 1) OpenDLAModel corresponds to the smodel files extracted from the compressed package image-dev_model_convert.tar. 2) The conversion command needs to be run in the Docker environment; please load the SGS Docker environment according to the Docker development environment tutorial first.

  • Copy the ONNX model to the conversion code directory $cp speech_transformer_lm_zh-cn/lm_1x100.onnx OpenDLAModel/llm/transformerlm/onnx

  • Conversion command $cd IPU_SDK_Release/docker $bash run_docker.sh # Enter the OpenDLAModel directory in the docker environment $cd /work/SGS_XXX/OpenDLAModel $bash convert.sh -a llm/transformerlm -c config/llm_transformerlm.cfg -p SGS_IPU_Toolchain (absolute path) -s false

  • Final generated model addresses output/{chip}_/lm_100sim.img output/{chip}_/lm_100sim_fixed.sim output/{chip}_/lm_100sim_float.sim

2.2.3 Key Script Parameter Analysis

-   input_config.ini
        [INPUT_CONFIG]
        inputs=token;                       # ONNX input node name, separate with commas if there are multiple;
        input_formats=RAWDATA_S16_NHWC;     # Board input format, can choose based on the ONNX input format, e.g., float: RAWDATA_F32_NHWC, int32: RAWDATA_S16_NHWC;
        quantizations=TRUE;                 # Enable input quantization, no need to change;
        [OUTPUT_CONFIG]
        outputs=probs;                      # ONNX output node name, separate with commas if there are multiple;
        dequantizations=TRUE;               # Whether to enable dequantization, fill according to actual needs, recommended to be TRUE. Set to False, output will be int16; set to True, output will be float32;
-   llm_transformerlm.cfg
        [TRANSFORMERLM]
        CHIP_LIST=pcupid                    # Platform name, must match board platform, otherwise the model will not run
        Model_LIST=lm_1x100                 # Input ONNX model name
        INPUT_SIZE_LIST=0                   # Model input resolution, fill in 0 here
        INPUT_INI_LIST=input_config.ini     # Configuration file
        CLASS_NUM_LIST=0                    # Just fill in 0
        SAVE_NAME_LIST=lm_100sim.img        # Output model name
        QUANT_DATA_PATH=image_list.txt      # Quantization data path

2.3 Model Simulation

  • Get float/fixed/offline model output $bash convert.sh -a llm/transformerlm -c config/llm_transformerlm.cfg -p SGS_IPU_Toolchain (absolute path) -s true After executing the above command, the tensor output of the float model will be saved by default to a txt file in the llm/transformerlm/log/output path. Additionally, the llm/transformerlm/convert.sh script also provides simulation examples for fixed and offline; users can uncomment the code blocks to obtain fixed and offline model outputs during runtime.

  • Model Accuracy Comparison Under the condition that the input is the same as the above model, enter the environment built in section 2.1. In the FunASR/funasr/bin/lm_inference.py script, add a print statement after line 175: print(nll) This will allow you to obtain the output tensor corresponding to the Pytorch model node, and compare it with the float, fixed, and offline models. Additionally, it is important to note that the output format of the original model is NCHW, while the output formats of the float/fixed/offline models are NHWC.

3 Board-Side Deployment

3.1 Program Compilation

Before compiling the example program, you need to first select the deconfig for SDK full-package compilation based on the board (nand/nor/emmc, DDR model, etc.), which can be referenced in the alkaid SDK sigdoc document titled "Development Environment Setup."

  • Compile the board-side transformerlm example. $cd sdk/verify/opendla $make clean && make source/llm/transformerlm -j8

  • Final generated executable file address sdk/verify/opendla/out/${AARCH}/app/prog_llm_transformerlm

3.2 Running Files

When running the program, the following files need to be copied to the board:

  • prog_llm_transformerlm
  • lm_100sim.img

3.3 Running Instructions

  • Usage: ./prog_llm_transformerlm model dict (execution file usage command)
  • Required Input:

    • model: offline model path
    • dict: dictionary
  • Typical Output:

    ./prog_llm_transformerlm models/lm_100sim.img resource/units_asr_punc_lm.txt
    
        client [801] connected, module:ipu
        load dict...
        invoke start ...
        model invoke time: 60.463000 ms
        lm score: -3.692400
    
        ------shutdown IPU0------
        client [801] disconnected, module:ipu