TransformerLM
1 Overview¶
1.1 Background Introduction¶
TransformerLM is typically used as a scorer for the acoustic model in ASR, re-scoring the predicted results. This model comes from the FunASR open-source repository. For specific details, please refer to:
https://github.com/modelscope/FunASR/blob/v0.3.0/funasr/bin/lm_inference.py
The model download address is:
https://www.modelscope.cn/models/iic/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/files
1.2 Usage Instructions¶
The Linux SDK-alkaid comes with pre-converted offline models and board-side examples by default. The relevant file paths are as follows:
-
Board-side example program path Linux_SDK/sdk/verify/opendla/source/llm/transformerlm
-
Board-side offline model path Linux_SDK/project/board/${chip}/dla_file/ipu_open_models/llm/lm_100sim.img
-
Board-side test dictionary path Linux_SDK/sdk/verify/opendla/source/resource/units_asr_punc_lm.txt
If the user does not need to convert the model, they can directly skip to section 3.
2 Model Conversion¶
2.1 onnx Model Conversion¶
-
Setting up the Python environment
$conda create -n funasr python==3.10 $conda activate funasr $pip install "modelscope[audio_asr]" --upgrade -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html $git clone https://github.com/alibaba/FunASR.git && cd FunASR $pip install --editable ./ -i https://mirrors.aliyun.com/pypi/simple/Note: The Python environment setup provided here is only for reference; please refer to the official source code running tutorial for specific setup processes:
https://github.com/modelscope/FunASR/tree/v0.3.0?tab=readme-ov-file#installation -
Model testing
-
Write the model testing script
funasr/bin/predict.pyfrom modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks inference_pipline = pipeline( task=Tasks.language_score_prediction, model='./speech_transformer_lm_zh-cn', output_dir='./tmp/') rec_result = inference_pipline(text_in='hello 大 家 好 呀') print(rec_result) -
Run the model testing script to ensure the funasr environment is configured correctly
$ Place the model file speech_transformer_lm_zh-cn-common-vocab8404-pytorch in the current directory $ mv speech_transformer_lm_zh-cn-common-vocab8404-pytorch speech_transformer_lm_zh-cn $ python ./funasr/bin/predict.py
-
-
Model export
-
Write the model conversion script
funasr/bin/lm_export.py:import os import sys sys.path.append(os.getcwd()) import logging import argparse import torch import onnx import onnxsim import onnxruntime as ort from funasr.torch_utils.initialize import initialize from funasr.train.class_choices import ClassChoices from funasr.lm.espnet_model import ESPnetLanguageModel from funasr.lm.seq_rnn_lm import SequentialRNNLM from funasr.lm.transformer_lm import TransformerLM from funasr.lm.abs_model import AbsLM from funasr.modules.mask import subsequent_mask from funasr.modules.nets_utils import make_pad_mask import torch.nn.functional as F import numpy as np from funasr.tasks.lm import LMTask from funasr.torch_utils.forward_adaptor import ForwardAdaptor def load_vocab(vocab_path, extra_word_list=[]): n = len(extra_word_list) with open(vocab_path, encoding='utf-8') as vf: vocab = {word.strip(): i + n for i, word in enumerate(vf)} for i, word in enumerate(extra_word_list): vocab[word] = i return vocab def softmaxcrossentropy_c( # type: ignore x, target, weight=None, reduction="None" ): input_shape = x.shape max_x = np.max(x, axis=1, keepdims=True).astype(np.float64) exp_x = np.exp(x - max_x) p = exp_x / np.sum(exp_x, axis=1, keepdims=True) inp = np.log(p) N = input_shape[0] neg_gather_element_input = np.zeros((N), dtype=x.dtype) inp = np.squeeze(inp) target = np.squeeze(target) for i in range(N): index = target[i] neg_gather_element_input[i] = -inp[i, index] return neg_gather_element_input if __name__ == '__main__': # 1. Build Model model, train_args = LMTask.build_model_from_file( './speech_transformer_lm_zh-cn/lm.yaml', './speech_transformer_lm_zh-cn/lm.pb', 'cpu') wrapped_model = ForwardAdaptor(model, "export") tmp_input = "梁家人出烟的电视剧有什么" token_dict = load_vocab('./speech_transformer_lm_zh-cn/tokens.txt') x_tmp = [] for char in tmp_input: if char in token_dict.keys(): x_tmp.append(token_dict[char]) else: x_tmp.append(token_dict["<unk>"]) x_tmp = torch.tensor([x_tmp], dtype=torch.int64) x_tmp = torch.cat([x_tmp,torch.zeros((1, 100 - x_tmp.shape[1]), dtype=int)], dim=1) wrapped_model.eval() onnx_path = './speech_transformer_lm_zh-cn/lm_1x100.onnx' torch.onnx.export( wrapped_model, (x_tmp), onnx_path, export_params=True, opset_version=14, input_names=["token"], output_names=["probs", 'x_lengths'], verbose=False, ) model_onnx = onnx.load(onnx_path) # load onnx model onnx.checker.check_model(model_onnx) # check onnx model model_onnx, check = onnxsim.simplify(model_onnx) onnx.save(model_onnx, onnx_path.replace('lm_1x100','lm_1x100_sim')) ort_sess = ort.InferenceSession(onnx_path) input_length = 100 input_list = [ "梁家人出烟的电视剧有什么" ] token_dict = load_vocab('./speech_transformer_lm_zh-cn/tokens.txt') count = 0 for sentence in input_list: x = [] for char in sentence: if char in token_dict.keys(): x.append(token_dict[char]) else: x.append(token_dict["<unk>"]) src_len = len(x) x = torch.tensor([x], dtype=torch.int64) x = torch.cat([x,torch.zeros((1, 100 - x.shape[1]), dtype=int)], dim=1) np.save("./speech_transformer_lm_zh-cn/lmx100/lm_input_%d.npy"%(count), x) count+=1 ort_inputs = { ort_sess.get_inputs()[0].name: x.numpy(), } ort_outs = ort_sess.run(None, ort_inputs) logits = ort_outs[0] print("logits: ", np.array(logits[0]).shape) print("logits: ", logits[0][0][:10]) onnx_nll = softmaxcrossentropy_c(logits[0], ort_outs[1].flatten()) print("onnx nll", onnx_nll) x_lens = torch.full((1,), fill_value=src_len, dtype=torch.int64) mask_ = np.zeros(input_length+1) mask_[np.arange(x_lens + 1)] = 1 onnx_nll = np.array(onnx_nll * mask_) onnx_nll = np.sum(onnx_nll) onnx_nll = onnx_nll / (x_lens + 1) print("onnx_nll ", onnx_nll) -
Run the model conversion script, which will generate the
lm_1x100.onnxmodel in the./speech_transformer_lm_zh-cndirectorypython ./funasr/bin/lm_export.py
-
2.2 Offline Model Conversion¶
2.2.1 Pre & Post Processing Instructions¶
- Preprocessing
The input to the language model is the output of the acoustic model, which corresponds to the index values in the dictionary. The input information for the successfully converted
lm_sim.onnxmodel is shown in the image below:

- Postprocessing The language model has no postprocessing operations; the scores output by the language model are typically multiplied by a coefficient and then weighted with the scores from the acoustic model output, which is then decoded using greedy search. The output information is shown below:
2.2.2 Offline Model Conversion Process¶
Note: 1) OpenDLAModel corresponds to the smodel files extracted from the compressed package image-dev_model_convert.tar. 2) The conversion command needs to be run in the Docker environment; please load the SGS Docker environment according to the Docker development environment tutorial first.
-
Copy the ONNX model to the conversion code directory $cp speech_transformer_lm_zh-cn/lm_1x100.onnx OpenDLAModel/llm/transformerlm/onnx
-
Conversion command $cd IPU_SDK_Release/docker $bash run_docker.sh # Enter the OpenDLAModel directory in the docker environment $cd /work/SGS_XXX/OpenDLAModel $bash convert.sh -a llm/transformerlm -c config/llm_transformerlm.cfg -p SGS_IPU_Toolchain (absolute path) -s false
-
Final generated model addresses output/{chip}_/lm_100sim.img output/{chip}_/lm_100sim_fixed.sim output/{chip}_/lm_100sim_float.sim
2.2.3 Key Script Parameter Analysis¶
- input_config.ini
[INPUT_CONFIG]
inputs=token; # ONNX input node name, separate with commas if there are multiple;
input_formats=RAWDATA_S16_NHWC; # Board input format, can choose based on the ONNX input format, e.g., float: RAWDATA_F32_NHWC, int32: RAWDATA_S16_NHWC;
quantizations=TRUE; # Enable input quantization, no need to change;
[OUTPUT_CONFIG]
outputs=probs; # ONNX output node name, separate with commas if there are multiple;
dequantizations=TRUE; # Whether to enable dequantization, fill according to actual needs, recommended to be TRUE. Set to False, output will be int16; set to True, output will be float32;
- llm_transformerlm.cfg
[TRANSFORMERLM]
CHIP_LIST=pcupid # Platform name, must match board platform, otherwise the model will not run
Model_LIST=lm_1x100 # Input ONNX model name
INPUT_SIZE_LIST=0 # Model input resolution, fill in 0 here
INPUT_INI_LIST=input_config.ini # Configuration file
CLASS_NUM_LIST=0 # Just fill in 0
SAVE_NAME_LIST=lm_100sim.img # Output model name
QUANT_DATA_PATH=image_list.txt # Quantization data path
2.3 Model Simulation¶
-
Get float/fixed/offline model output $bash convert.sh -a llm/transformerlm -c config/llm_transformerlm.cfg -p SGS_IPU_Toolchain (absolute path) -s true After executing the above command, the tensor output of the
floatmodel will be saved by default to a txt file in thellm/transformerlm/log/outputpath. Additionally, thellm/transformerlm/convert.shscript also provides simulation examples forfixedandoffline; users can uncomment the code blocks to obtainfixedandofflinemodel outputs during runtime. -
Model Accuracy Comparison Under the condition that the input is the same as the above model, enter the environment built in section 2.1. In the
FunASR/funasr/bin/lm_inference.pyscript, add a print statement after line 175: print(nll) This will allow you to obtain the output tensor corresponding to the Pytorch model node, and compare it with the float, fixed, and offline models. Additionally, it is important to note that the output format of the original model isNCHW, while the output formats of the float/fixed/offline models areNHWC.
3 Board-Side Deployment¶
3.1 Program Compilation¶
Before compiling the example program, you need to first select the deconfig for SDK full-package compilation based on the board (nand/nor/emmc, DDR model, etc.), which can be referenced in the alkaid SDK sigdoc document titled "Development Environment Setup."
-
Compile the board-side transformerlm example. $cd sdk/verify/opendla $make clean && make source/llm/transformerlm -j8
-
Final generated executable file address sdk/verify/opendla/out/${AARCH}/app/prog_llm_transformerlm
3.2 Running Files¶
When running the program, the following files need to be copied to the board:
- prog_llm_transformerlm
- lm_100sim.img
3.3 Running Instructions¶
- Usage:
./prog_llm_transformerlm model dict(execution file usage command) -
Required Input:
- model: offline model path
- dict: dictionary
-
Typical Output:
./prog_llm_transformerlm models/lm_100sim.img resource/units_asr_punc_lm.txt client [801] connected, module:ipu load dict... invoke start ... model invoke time: 60.463000 ms lm score: -3.692400 ------shutdown IPU0------ client [801] disconnected, module:ipu