RISCv Debug Sop

REVISION HISTORY

Revision No. Description Date
1.0 Initial release 12/16/2024

1. RISCv无法启动如何排查?

如果遇到RISCv没有任何log输出,首先可以现确认镜像是否有被成功加载,确认镜像是否有被成功加载的步骤如下。

1.1. 确认RISCv加载地址

镜像的加载地址可以在编译的配置文件中找到,例如mak/options_pcupid_riscv_isw.mak中:

# Feature_Name = uImage load address
# Description = define uImage load address
# Option_Selection = ADDRESS
CONFIG_RTOS_LOAD_ADDR = 0x26800000

由此可知,RISCv的镜像将会从存储介质(例如Flash、EMMC)中被加载到ddr上,地址为0x26800000。

1.2. 确定代码段大小

RISCv的镜像中会包含代码段(text section)和数据段(data section),代码段存储的为程序的指令,在运行过程中不会被改变,数据段存储的为程序的带初值的全局变量,在运行过程中会被改写。所以要确认镜像是否加载正确,只需要对比镜像的代码段即可。确定代码段的大小,可以在编译服务器上使用以下命令获得:

readelf -S build/pcupid_riscv_isw/out/pcupid_riscv_isw.elf

There are 28 section headers, starting at offset 0x2292a4:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] XRAM              PROGBITS        10000000 061fcc 000000 00   W  0   0  1
  [ 2] .text             PROGBITS        10000000 001000 041148 00  AX  0   0 256
  [ 3] .rodata           PROGBITS        10041148 042148 01e780 00   A  0   0  8
  [ 4] PREMAIN_INITCALL  PROGBITS        1005f8c8 061fcc 000000 00   W  0   0  1
  [ 5] NORM_INITCALL     PROGBITS        1005f8c8 0608c8 000028 00  WA  0   0  4
  [ 6] APPLICATION_INITC PROGBITS        1005f8f0 0608f0 00001c 00  WA  0   0  4
  [ 7] XRAM0             PROGBITS        1005f90c 06090c 000004 00  WA  0   0  4
  [ 8] .cli_cmd_list     PROGBITS        1005f910 060910 0002a0 00  WA  0   0  4
  [ 9] .cam_dev_list     PROGBITS        1005fbb0 061fcc 000000 00   W  0   0  1
  [10] .data             PROGBITS        1005fbb0 060bb0 001414 00  WA  0   0  8
  [11] RW_STATICBOOT     PROGBITS        10060fc4 061fc4 000008 00  WA  0   0  1
  [12] DEBUG_AREA        PROGBITS        10060fcc 061fcc 000000 00   W  0   0  1
  [13] .bss              NOBITS          10061000 061fcc 00f470 00  WA  0   0 64
  [14] .sys_stack        NOBITS          10070470 061fcc 000500 00  WA  0   0  1
  [15] .data2            NOBITS          101b8000 062000 008000 00  WA  0   0  1
  [16] .debug_info       PROGBITS        00000000 061fcc 0cc52a 00      0   0  1
  [17] .debug_abbrev     PROGBITS        00000000 12e4f6 019483 00      0   0  1
  [18] .debug_loc        PROGBITS        00000000 147979 0475fc 00      0   0  1
  [19] .debug_aranges    PROGBITS        00000000 18ef78 004590 00      0   0  8
  [20] .debug_line       PROGBITS        00000000 193508 055e97 00      0   0  1
  [21] .debug_str        PROGBITS        00000000 1e939f 01d66b 01  MS  0   0  1
  [22] .comment          PROGBITS        00000000 206a0a 000011 01  MS  0   0  1
  [23] .debug_frame      PROGBITS        00000000 206a1c 00f1a0 00      0   0  4
  [24] .debug_ranges     PROGBITS        00000000 215bbc 000ae0 00      0   0  1
  [25] .symtab           SYMTAB          00000000 21669c 009070 10     26 926  4
  [26] .strtab           STRTAB          00000000 21f70c 009a6f 00      0   0  1
  [27] .shstrtab         STRTAB          00000000 22917b 000129 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

可以看到.text的大小为0x41148

1.3. 确认RISCv镜像是否加载正确

RISCv的镜像会在IPL_CUST阶段被加载,所以可以借助u-boot命令行来确认RISCv镜像是否加载正确。进入u-boot命令行后,配置正确的网络环境,然后使用tftp命令dump ddr上的数据内容:

SigmaStar # tftpput 0x26800000 0x41148 riscvfw_dump.bin
Using sstar_emac device
TFTP to server 10.21.2.10; our IP address is 10.24.16.137; sending through gateway 10.24.16.254
Filename 'riscvfw_dump.bin'.
Save address: 0x26800000
Save size:    0x41148
Saving: T ##################
         43 KiB/s
done
Bytes transferred = 266568 (41148 hex)
SigmaStar #

使用对比工具对比烧录的镜像和dump出来的文件观察是否一致。

2. 判断RISCv是否处于运行状态

可以通过读写寄存器的工具(例如:system tool、riu_r)获取以下寄存器的值:

Bank Offset Bits Description
0x1E 0x35 [0] 0: diable 1: enable

3. 判断RISCv是否处于TCM Mode

可以通过读写寄存器的工具(例如:system tool、riu_r)获取以下寄存器的值:

Bank Offset Bits Description
0x802 0x18 [2:0] 3b'000: icache mode
3b'001: reserved
3b'010: tcm mode

4. RISCv如何查看线程优先级,线程占用CPU Loading情况

通过CLI 命令taskstat,此命令会统计输入后1s内的所有线程的cpu loading情况。

ID PRIO STAT CPU STACK USAGE NAME HANDLER
1 111 B 0.0 776/3072 SYS_CUST 0x1002a250
2 2 B 0.0 496/2048 CONSOLE 0x1002c6b8
3 2 X 0.0 1320/4096 MENU 0x1002a268
4 0 R 99.9 144/2044 IDLE 0x00000000
5 127 B 0.0 192/2040 Tmr Svc 0x1003919c
6 99 B 0.0 648/1528 NonSecureWorld 0x1002e900
7 64 B 0.0 776/2040 rpmsg_dualos 0x1001b194

5. RISCv如何查看所有中断的状态

通过CLI 命令intrstat 获取中断触发次数,中断耗时时间,中断注册名称:

INT Count MaxTimeUs AvgTimeUs Handler DevId Affinity Name
162 1 7 14 0x10016000 0x10072740 0x00000001 pwm_group3
365 13 22 22 0x1001B0B0 0x00000000 0x00000001 RPMSG_L2R
105 0 0 4294967297 0x1000664C 0x1006230C 0x00000001 bdma7
105 0 0 4294967297 0x1000664C 0x1006228C 0x00000001 bdma6
105 0 0 4294967297 0x1000664C 0x1006220C 0x00000001 bdma5
105 0 0 4294967297 0x1000664C 0x1006218C 0x00000001 bdma4

6. Riscv 串口卡住,无法输入如何Debug

6.1. 原因

串口的输入依赖CONSOLE和MENU两个Task,默认优先级为2,当Riscv的CPU被其他高优先级的Task完全抢占的时候,就会遇到串口无法输入的情况。

6.2. 常见问题

1)客户代码逻辑异常,rtos_application_initcall 函数里调用耗时操作API, initcall优先级属于SYS_CUST(111)的优先级

2)客户通过CamOsThreadCreate创建的Thread,使用的是while(1){……CamOsMsDelay()}进行轮询执行任务,Delay不会释放CPU资源,应该改为CamOsMsSleep

3)系统的IRQ把Riscv的Loading都抢走了,这类问题一般是公版的Bug,或者是客户在中断处理函数里,有耗时操作同时中断数量又很多,导致处理不过来

6.3. 排查方法

通过Riscv PC指针查看当前Riscv 在干吗,连续读10次左右,然后通过addr2line 解析PC指针对应代码位置。

6.3.1. 获取PC指针方法

1)如果Arm串口可以正常工作,使用如下命令

/customer # ./riux32_r 0x803 0x1
BANK:0x0803 16bit-offset 0x01
0x1002D630

2)如果Arm串口也卡死了,通过Debug串口使用SStarSystemTool进行读取 选择到X32栏位寄存器填入0x803,查看offset:0x1的值

6.3.2. 解析PC指针方法

1)先找到问题环境对应riscv编译的elf文件:

MOUNRIVER 编译文件路径: rtk\proj\obj\PCUPID.elf, Linux编译路径:rtk\proj\build\pcupid_riscv_isw\out\pcupid_riscv_isw.elf

2)在Linux命令行使用addr2line命令解析pc指针, 可以同时解析多个地址

riscv64-unknown-elf-addr2line -e xxx.elf 0x1002D630  0x1002D630
或者addr2line- e xxx.elf 0x1002D630 0x1002D630

7. Riscv Exception报错原因如何排查

exception原因主要分为以下4种:

  • DATA ABORT: 对非法内存地址进行存取,建议检查获取内存地址的流程是否出错
  • UNDEFINED INSTRUCTION: CPU执行到无法识别的指令, 建议检查function指针是否错误,或是function所在的内存被破环了
  • PREFETCH ABORT:CPU对非法内存地址读取指令,建议检查function指针是否错误
  • SYSTEM ASSERT:code流程主动触发exception

exception信息,主要分3部分:

  • Exception register info: 打印RISCV cpu主要的一些状态寄存器信息
  • Exception type: 打印异常原因
  • Panic message: 打印异常时记录的backtrace以及触发assert的具体位置

Exception实例:

Exception without dump info
Exception type: SYSTEM ASSERT (240), Param: 0x100425c4
Panic at 0x10004d90 (unknown symbol)
Panic message: Test Assert
Call Stack Backtrace Begin:
    #0  0x10004d90 (unknown symbol)
    #1  0x1000550a (unknown symbol)
    #2  0x10025038 (unknown symbol)
    #3  0x1002313c (unknown symbol)
    #4  0x1002c5e6 (unknown symbol)
    #5  0x1002316a (unknown symbol)
    #6  0x1002c5e6 (unknown symbol)
    #7  0x1003885a (unknown symbol)
    #8  0x10038854 (unknown symbol)
    #9  0x1002acd8 (unknown symbol)
    #10  0x1002b34e (unknown symbol)
    #11  0x1002be12 (unknown symbol)
    #12  0x1000311c (unknown symbol)

根据Panic backtrace,参考前面章节提到的解pc指针的方法解析

aarch64-linux-gnu-addr2line -e pcupid_riscv_isw.elf 0x10004d90 0x1000550a 0x10025038 0x1002313c 0x1002c5e6 0x1002316a 0x1002c5e6 0x1003885a 0x10038854 0x1002acd8 0x1002b34e 0x1002be12 0x1000311c

根据backtrace解析查看代码,发现riscv_gpio.c 63行位置调用了CamOsPanic("Test Assert");

/home/beck.zhang/5_2.3.0_p3p/riscv/kernel/rtk/proj/build/pcupid_riscv_isw/out/riscv_gpio.c:63

…………………………
/home/beck.zhang/5_2.3.0_p3p/riscv/kernel/rtk/proj/build/pcupid_riscv_isw/out/core_state.c:160