TransformerLM

1 Overview¶

1.1 Background Introduction¶

TransformerLM is typically used as a scorer for the acoustic model in ASR, re-scoring the predicted results. This model comes from the FunASR open-source repository. For specific details, please refer to:

https://github.com/modelscope/FunASR/blob/v0.3.0/funasr/bin/lm_inference.py

The model download address is:

https://www.modelscope.cn/models/iic/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/files

1.2 Usage Instructions¶

The Linux SDK-alkaid comes with pre-converted offline models and board-side examples by default. The relevant file paths are as follows:

Board-side example program path Linux_SDK/sdk/verify/opendla/source/llm/transformerlm
Board-side offline model path Linux_SDK/project/board/${chip}/dla_file/ipu_open_models/llm/lm_100sim.img
Board-side test dictionary path Linux_SDK/sdk/verify/opendla/source/resource/units_asr_punc_lm.txt

If the user does not need to convert the model, they can directly skip to section 3.

2 Model Conversion¶

2.1 onnx Model Conversion¶

Setting up the Python environment

$conda create -n funasr python==3.10
$conda activate funasr
$pip install "modelscope[audio_asr]" --upgrade -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
$git clone https://github.com/alibaba/FunASR.git && cd FunASR
$pip install --editable ./ -i https://mirrors.aliyun.com/pypi/simple/

Note: The Python environment setup provided here is only for reference; please refer to the official source code running tutorial for specific setup processes:

https://github.com/modelscope/FunASR/tree/v0.3.0?tab=readme-ov-file#installation

Model testing

Write the model testing script funasr/bin/predict.py

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_pipline = pipeline(
    task=Tasks.language_score_prediction,
    model='./speech_transformer_lm_zh-cn',
    output_dir='./tmp/')

rec_result = inference_pipline(text_in='hello 大 家 好 呀')
print(rec_result)

Run the model testing script to ensure the funasr environment is configured correctly

$ Place the model file speech_transformer_lm_zh-cn-common-vocab8404-pytorch in the current directory
$ mv speech_transformer_lm_zh-cn-common-vocab8404-pytorch speech_transformer_lm_zh-cn
$ python ./funasr/bin/predict.py

Model export

Write the model conversion script funasr/bin/lm_export.py:

import os
import sys
sys.path.append(os.getcwd())
import logging
import argparse
import torch
import onnx
import onnxsim
import onnxruntime as ort

from funasr.torch_utils.initialize import initialize
from funasr.train.class_choices import ClassChoices

from funasr.lm.espnet_model import ESPnetLanguageModel
from funasr.lm.seq_rnn_lm import SequentialRNNLM
from funasr.lm.transformer_lm import TransformerLM
from funasr.lm.abs_model import AbsLM
from funasr.modules.mask import subsequent_mask
from funasr.modules.nets_utils import make_pad_mask
import torch.nn.functional as F

import numpy as np
from funasr.tasks.lm import LMTask
from funasr.torch_utils.forward_adaptor import ForwardAdaptor

def load_vocab(vocab_path, extra_word_list=[]):
    n = len(extra_word_list)
    with open(vocab_path, encoding='utf-8') as vf:
        vocab = {word.strip(): i + n for i, word in enumerate(vf)}

    for i, word in enumerate(extra_word_list):
        vocab[word] = i

    return vocab

def softmaxcrossentropy_c(  # type: ignore
    x, target, weight=None, reduction="None"
):
    input_shape = x.shape
    max_x = np.max(x, axis=1, keepdims=True).astype(np.float64)
    exp_x = np.exp(x - max_x)
    p = exp_x / np.sum(exp_x, axis=1, keepdims=True)
    inp = np.log(p)

    N = input_shape[0]
    neg_gather_element_input = np.zeros((N), dtype=x.dtype)

    inp = np.squeeze(inp)
    target = np.squeeze(target)
    for i in range(N):
        index = target[i]
        neg_gather_element_input[i] = -inp[i, index]

    return neg_gather_element_input

if __name__ == '__main__':
    # 1. Build Model
    model, train_args = LMTask.build_model_from_file(
        './speech_transformer_lm_zh-cn/lm.yaml', './speech_transformer_lm_zh-cn/lm.pb', 'cpu')
    wrapped_model = ForwardAdaptor(model, "export")

    tmp_input = "梁家人出烟的电视剧有什么"
    token_dict = load_vocab('./speech_transformer_lm_zh-cn/tokens.txt')
    x_tmp = []
    for char in tmp_input:
        if char in token_dict.keys():
            x_tmp.append(token_dict[char])
        else:
            x_tmp.append(token_dict["<unk>"])
    x_tmp = torch.tensor([x_tmp], dtype=torch.int64)
    x_tmp = torch.cat([x_tmp,torch.zeros((1, 100 - x_tmp.shape[1]), dtype=int)], dim=1)

    wrapped_model.eval()

    onnx_path = './speech_transformer_lm_zh-cn/lm_1x100.onnx'
    torch.onnx.export(
            wrapped_model,
            (x_tmp),
            onnx_path,
            export_params=True,
            opset_version=14,
            input_names=["token"],
            output_names=["probs", 'x_lengths'],
            verbose=False,
        )
    model_onnx = onnx.load(onnx_path)  # load onnx model
    onnx.checker.check_model(model_onnx)  # check onnx model
    model_onnx, check = onnxsim.simplify(model_onnx)
    onnx.save(model_onnx, onnx_path.replace('lm_1x100','lm_1x100_sim'))
    ort_sess = ort.InferenceSession(onnx_path)

    input_length = 100
    input_list = [
            "梁家人出烟的电视剧有什么"
        ]
    token_dict = load_vocab('./speech_transformer_lm_zh-cn/tokens.txt')
    count = 0

    for sentence in input_list:
        x = []
        for char in sentence:
            if char in token_dict.keys():
                x.append(token_dict[char])
            else:
                x.append(token_dict["<unk>"])
        src_len = len(x)
        x = torch.tensor([x], dtype=torch.int64)
        x = torch.cat([x,torch.zeros((1, 100 - x.shape[1]), dtype=int)], dim=1)
        np.save("./speech_transformer_lm_zh-cn/lmx100/lm_input_%d.npy"%(count), x)
        count+=1
        ort_inputs = {
                ort_sess.get_inputs()[0].name: x.numpy(),
            }
        ort_outs = ort_sess.run(None, ort_inputs)
        logits = ort_outs[0]

        print("logits: ", np.array(logits[0]).shape)
        print("logits: ", logits[0][0][:10])

        onnx_nll = softmaxcrossentropy_c(logits[0], ort_outs[1].flatten())
        print("onnx nll", onnx_nll)

        x_lens = torch.full((1,), fill_value=src_len, dtype=torch.int64)
        mask_ = np.zeros(input_length+1)
        mask_[np.arange(x_lens + 1)] = 1

        onnx_nll = np.array(onnx_nll * mask_)
        onnx_nll = np.sum(onnx_nll)
        onnx_nll = onnx_nll / (x_lens + 1)
        print("onnx_nll ", onnx_nll)

Run the model conversion script, which will generate the lm_1x100.onnx model in the ./speech_transformer_lm_zh-cn directory
```
python ./funasr/bin/lm_export.py
```

2.2 Offline Model Conversion¶

2.2.1 Pre & Post Processing Instructions¶

Preprocessing The input to the language model is the output of the acoustic model, which corresponds to the index values in the dictionary. The input information for the successfully converted lm_sim.onnx model is shown in the image below:

Postprocessing The language model has no postprocessing operations; the scores output by the language model are typically multiplied by a coefficient and then weighted with the scores from the acoustic model output, which is then decoded using greedy search. The output information is shown below:

2.2.2 Offline Model Conversion Process¶

Note: 1) OpenDLAModel corresponds to the smodel files extracted from the compressed package image-dev_model_convert.tar. 2) The conversion command needs to be run in the Docker environment; please load the SGS Docker environment according to the Docker development environment tutorial first.

Copy the ONNX model to the conversion code directory $cp speech_transformer_lm_zh-cn/lm_1x100.onnx OpenDLAModel/llm/transformerlm/onnx
Conversion command $cd IPU_SDK_Release/docker $bash run_docker.sh # Enter the OpenDLAModel directory in the docker environment $cd /work/SGS_XXX/OpenDLAModel $bash convert.sh -a llm/transformerlm -c config/llm_transformerlm.cfg -p SGS_IPU_Toolchain (absolute path) -s false
Final generated model addresses output/ ${chip}_$ /lm_100sim.img output/ ${chip}_$ /lm_100sim_fixed.sim output/ ${chip}_$ /lm_100sim_float.sim

2.2.3 Key Script Parameter Analysis¶

-   input_config.ini
        [INPUT_CONFIG]
        inputs=token;                       # ONNX input node name, separate with commas if there are multiple;
        input_formats=RAWDATA_S16_NHWC;     # Board input format, can choose based on the ONNX input format, e.g., float: RAWDATA_F32_NHWC, int32: RAWDATA_S16_NHWC;
        quantizations=TRUE;                 # Enable input quantization, no need to change;
        [OUTPUT_CONFIG]
        outputs=probs;                      # ONNX output node name, separate with commas if there are multiple;
        dequantizations=TRUE;               # Whether to enable dequantization, fill according to actual needs, recommended to be TRUE. Set to False, output will be int16; set to True, output will be float32;
-   llm_transformerlm.cfg
        [TRANSFORMERLM]
        CHIP_LIST=pcupid                    # Platform name, must match board platform, otherwise the model will not run
        Model_LIST=lm_1x100                 # Input ONNX model name
        INPUT_SIZE_LIST=0                   # Model input resolution, fill in 0 here
        INPUT_INI_LIST=input_config.ini     # Configuration file
        CLASS_NUM_LIST=0                    # Just fill in 0
        SAVE_NAME_LIST=lm_100sim.img        # Output model name
        QUANT_DATA_PATH=image_list.txt      # Quantization data path

2.3 Model Simulation¶

Get float/fixed/offline model output $bash convert.sh -a llm/transformerlm -c config/llm_transformerlm.cfg -p SGS_IPU_Toolchain (absolute path) -s true After executing the above command, the tensor output of the float model will be saved by default to a txt file in the llm/transformerlm/log/output path. Additionally, the llm/transformerlm/convert.sh script also provides simulation examples for fixed and offline; users can uncomment the code blocks to obtain fixed and offline model outputs during runtime.
Model Accuracy Comparison Under the condition that the input is the same as the above model, enter the environment built in section 2.1. In the FunASR/funasr/bin/lm_inference.py script, add a print statement after line 175: print(nll) This will allow you to obtain the output tensor corresponding to the Pytorch model node, and compare it with the float, fixed, and offline models. Additionally, it is important to note that the output format of the original model is NCHW, while the output formats of the float/fixed/offline models are NHWC.

3 Board-Side Deployment¶

3.1 Program Compilation¶

Before compiling the example program, you need to first select the deconfig for SDK full-package compilation based on the board (nand/nor/emmc, DDR model, etc.), which can be referenced in the alkaid SDK sigdoc document titled "Development Environment Setup."

Compile the board-side transformerlm example. $cd sdk/verify/opendla $make clean && make source/llm/transformerlm -j8
Final generated executable file address sdk/verify/opendla/out/${AARCH}/app/prog_llm_transformerlm

3.2 Running Files¶

When running the program, the following files need to be copied to the board:

prog_llm_transformerlm
lm_100sim.img

3.3 Running Instructions¶

Usage: ./prog_llm_transformerlm model dict (execution file usage command)
Required Input:
- model: offline model path
- dict: dictionary

Typical Output:

./prog_llm_transformerlm models/lm_100sim.img resource/units_asr_punc_lm.txt

    client [801] connected, module:ipu
    load dict...
    invoke start ...
    model invoke time: 60.463000 ms
    lm score: -3.692400

    ------shutdown IPU0------
    client [801] disconnected, module:ipu