Java自学者论坛

 找回密码
 立即注册

手机号码,快捷登录

恭喜Java自学者论坛(https://www.javazxz.com)已经为数万Java学习者服务超过8年了!积累会员资料超过10000G+
成为本站VIP会员,下载本站10000G+会员资源,会员资料板块,购买链接:点击进入购买VIP会员

JAVA高级面试进阶训练营视频教程

Java架构师系统进阶VIP课程

分布式高可用全栈开发微服务教程Go语言视频零基础入门到精通Java架构师3期(课件+源码)
Java开发全终端实战租房项目视频教程SpringBoot2.X入门到高级使用教程大数据培训第六期全套视频教程深度学习(CNN RNN GAN)算法原理Java亿级流量电商系统视频教程
互联网架构师视频教程年薪50万Spark2.0从入门到精通年薪50万!人工智能学习路线教程年薪50万大数据入门到精通学习路线年薪50万机器学习入门到精通教程
仿小米商城类app和小程序视频教程深度学习数据分析基础到实战最新黑马javaEE2.1就业课程从 0到JVM实战高手教程MySQL入门到精通教程
查看: 457|回复: 0

Linux Kernel Oops异常分析

[复制链接]
  • TA的每日心情
    奋斗
    2024-11-24 15:47
  • 签到天数: 804 天

    [LV.10]以坛为家III

    2053

    主题

    2111

    帖子

    72万

    积分

    管理员

    Rank: 9Rank: 9Rank: 9

    积分
    726782
    发表于 2021-8-30 15:05:36 | 显示全部楼层 |阅读模式

     

    0.linux内核异常常用分析方法

    1. 异常地址是否在0附近,确认是否是空指针解引用问题
    2. 异常地址是否在iomem映射区,确认是否是设备访问总线异常问题,如PCI异常导致的地址访问异常
    3. 异常地址是否在stack附近,如果相邻,要考虑是否被踩
    4. 比较delay reset/nmi watchdog等多种机制打印的栈信息,看看pc是否在动,确定是否是死锁
    5. 用SysRq判断是真死还是假死
    6. 通过反汇编获得发生异常的C代码段和函数,查找开源社区是否已有补丁修复

    下面分别通过PowerPC和Mips64的2个异常例子详细讲解分析过程。

    1.PowerPC小系统内核异常分析

    1.1  异常打印

     

    Unable to handle kernel paging request for data at address 0x36fef31e
    Faulting instruction address: 0xc0088b8c
    Oops: Kernel access of bad area, sig: 11 [#1]
    PREEMPT SMP NR_CPUS=2
    Modules linked in: ossmod tipc ohci_hcd ehci_hcd cmm uart1655x bcm334 bootflash mtdchar bsp_flash_init boardctrl 85xx_debug util
    NIP: C0088B8C LR: C0088CF8 CTR: 00000000
    REGS: ce283e20 TRAP: 0300 Not tainted (2.6.21.7-EMBSYS-CGEL-3.04.10.P6.F5)
    MSR: 00021000 <ME> CR: 22004222 XER: 00000000
    DAR: 36FEF31E, DSISR: 00800000
    TASK = cffdf180[26] 'events/1' THREAD: ce282000 CPU: 1
    GPR00: 00100100 CE283ED0 CFFDF180 CF528000 C09EA500 EFFEAD20 CF5188A0 00000000
    GPR08: CF5188BC 00200200 36FEF31E D1FD7F9E 22004222 1010DA44 00000290 00000000
    GPR16: 1011C858 100147F4 BF9BC9C4 10100000 00000001 C0460000 C06454CC 00000000
    GPR24: C0640000 CE282000 C0640000 00000005 00000000 00000000 EFFE8EC0 CFFED958
    NIP [C0088B8C] free_block+0xc4/0x16c
    LR [C0088CF8] drain_array+0xc4/0x100
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Call Trace:
    [CE283ED0] [C06ABEC0] 0xc06abec0(unreliable)
    [CE283EF0] [C0088CF8] drain_array+0xc4/0x100
    [CE283F10] [C008A70C] cache_reap+0x94/0x13c
    [CE283F30] [C003DA2C] run_workqueue+0xc4/0x198
    [CE283F60] [C003E6D4] worker_thread+0x130/0x154
    [CE283FB0] [C0042E80] kthread+0xd4/0x110
    [CE283FF0] [C0011A70] original_kernel_thread+0x44/0x60

    Instruction dump:
    5400cffe 0f000000 80c4001c 7d1cf214 3c000010 3d200020 80a8001c 60000100
    81660000 61290200 81460004 3906001c <916a0000> 914b0004 90060000 91260004
    ------------[ cut here ]------------
    Badness at c0011e4c [verbose debug info unavailable]
    Call Trace:
    [CE283C50] [C00080BC] show_stack+0x3c/0x1a0 (unreliable)
    [CE283C80] [C018EA28] report_bug+0xb0/0xb8
    [CE283C90] [C000EC94] program_check_exception+0xcc/0x4f8
    [CE283CD0] [C0010BE4] ret_from_except_full+0x0/0x4c
    [CE283D90] [C0640000] 0xc0640000
    [CE283DD0] [C000E61C] die+0x1f0/0x27c
    [CE283E00] [C0014B18] bad_page_fault+0x98/0xe8
    [CE283E10] [C0010A88] handle_page_fault+0x7c/0x80
    [CE283ED0] [C06ABEC0] 0xc06abec0
    [CE283EF0] [C0088CF8] drain_array+0xc4/0x100
    [CE283F10] [C008A70C] cache_reap+0x94/0x13c
    [CE283F30] [C003DA2C] run_workqueue+0xc4/0x198
    [CE283F60] [C003E6D4] worker_thread+0x130/0x154
    [CE283FB0] [C0042E80] kthread+0xd4/0x110
    [CE283FF0] [C0011A70] original_kernel_thread+0x44/0x60

    1.2  Oops分析

     Oops: Kernel access of bad area, sig: 11 [#1]   

    异常分类

    Oops:内核态指令异常;

    BUG:内核检测到逻辑异常(类似于assert),会影响内核的后续运行;

    WARNING:类似于BUG,但是不会影响内核的后续运行;

    PANIC:类似于BUG,系统不能继续运行,直接挂起或重启;

    SOFTLOCK:长时间任务得不到调度;

     

    异常信号

    Signal

    Code

    Default Action

    Description

    SIGABRT

    6

    A

    Process abort signal

    SIGALRM

    14

    T

    Alarm clock

    SIGBUS

    10

    A

    Access to an undefined portion of a memory object

    SIGCHLD

    18

    I - Ignore the Signal

    Child process terminated, stopped,

    SIGCONT

    25

    C - Continue the process

    Continue executing, if stopped.

    SIGFPE

    8

    A

    Erroneous arithmetic operation.

    SIGHUP

    1

    T

    Hangup.

    SIGILL

    4

    A

    Illegal instruction.

    SIGINT

    2

    T

    Terminal interrupt signal.

    SIGKILL

    9

    T

    Kill (cannot be caught or ignored).

    SIGPIPE

    13

    T - Abnormal termination of the process

    Write on a pipe with no one to read it.

    SIGQUIT

    3

    A - Abnormal termination of the process

    Terminal quit signal.

    SIGSEGV

    11

    A

    Invalid memory reference.

    SIGSTOP

    23

    S - Stop the process

    Stop executing (cannot be caught or ignored).

    SIGTERM

    15

    T

    Termination signal.

    SIGTSTP

    23

    S

    Terminal stop signal.

    SIGTTIN

    26

    S

    Background process attempting read.

    SIGTTOU

    27

    S

    Background process attempting write.

    SIGUSR1

    16

    T

    User-defined signal 1.

    SIGUSR2

    17

    T

    User-defined signal 2.

    SIGPOLL

    22

    T

    Pollable event.

    SIGPROF

    29

    T

    Profiling timer expired.

    SIGSYS

    12

    A

    Bad system call.

    SIGTRAP

    5

    A

    Trace/breakpoint trap.

    SIGURG

    21

    I

    High bandwidth data is available at a socket.

    SIGVTALRM

    28

    T

    Virtual timer expired.

    SIGXCPU

    30

    A

    CPU time limit exceeded.

    SIGXFSZ

    31

    A

    File size limit exceeded

    Default Actions:

    T - Abnormal termination of the process. The process is terminated with all the consequences of _exit() except that the status made available to wait() and waitpid() indicates abnormal termination by the specified signal.

    A - Abnormal termination of the process. Additionally, implementation-defined abnormal termination actions, such as creation of a core file, may occur.

    I - Ignore the signal.

    S - Stop the process.

    C - Continue the process, if it is stopped; otherwise, ignore the signal.

     

    具体针对powerpc e500内核,异常与信号的对应关系如下:

     

    所以有进程访问了超出其虚拟地址空间的地址,内核报SIGSEGV(segment fault)信号。

    那是什么进程呢?

    其他

    #1,die_counter,表示Oops发生的次数,一般来说,如果有多条Oops,看第一条Oops信息,因为后面的Oops可能是第一条Oops的错误传播导致的。

     

    1.3  寄存器分析

    NIP: C0088B8C LR: C0088CF8 CTR: 00000000 

    NIP是next instruction pointer,值就是当前指令的地址。这里列出了3个寄存器的值。

    LR是link register其值为上一条指令的地址。

    CTR是count register,其值用于循环指令。

    REGS: ce283e20 TRAP: 0300   Not tainted  (2.6.21.7-EMBSYS-CGEL-3.04.10.P6.F5)  

    TRAP :异常处理函数入口地址;REGS :系统栈pt_regs的基址。pt_regs这个结构封装了需要在内核入口中保存的最少的状态信息。比如说每一次的系统调用、中断、陷阱、故障。

        0x100:    "(System Reset)"

           0x200:    "(Machine Check)"

           0x300:    "(Data Access)"

           0x380:    "(Data SLB Access)"

           0x400:    "(Instruction Access)"

           0x480:    "(Instruction SLB Access)"

           0x500:    "(Hardware Interrupt)"

           0x600:    "(Alignment)"

           0x700:    "(Program Check)"

           0x800:    "(FPU Unavailable)"

           0x900:    "(Decrementer)"

           0xc00:     "(System Call)"

           0xd00:    "(Single Step)"

           0xf00:     "(Performance Monitor)"

           0xf20:     "(Altivec Unavailable)"

           0x1300:   "(Instruction Breakpoint)"

    详细解释见《PowerPC™ e500 Core Family Reference Manual》“5.7 Interrupt Definitions”。

     

    tainted :内核错误信息,由add_taint设置,解释如下:

    *  'P' - Proprietary module has been loaded.

     *  'F' - Module has been forcibly loaded.

     *  'S' - SMP with CPUs not designed for SMP.

     *  'R' - User forced a module unload.

     *  'M' - System experienced a machine check exception.

     *  'B' - System has hit bad_page.

     *  'U' - Userspace-defined naughtiness.

     *  'D' - Kernel has oopsed before

     *  'A' - ACPI table overridden.

     *  'W' - Taint on warning.

     *  'C' - modules from drivers/staging are loaded.

     

    MSR: 00021000 <ME>  CR: 22004222  XER: 00000000  

    DAR: 36FEF31E, DSISR: 00800000

    MSR是machine state register;

    CR是condition register;

    XER为Integer Exception Register

    DAR为data address register,其值为造成了内存访问异常的地址。E500中为Data Exception Address Register (DEAR)

    DSISR为Data Storage Interrupt Status Register,是存储着发生内存访问异常原因的寄存器。E500中为Exception Syndrome Register (ESR)。0x00800000表示Store operation中的Alignment, data storage, data TLB error异常。

     

    TASK = cffdf180[26] 'events/1' THREAD: ce282000 CPU: 1

    cffdf180:进程task_struct结构体的地址;

    26:进程号;

    events/1:进程名;

    THREAD:进程的内核栈起始地址;

    CPU:当前CPU;

    当前进程也就是'events/1进程,出现SIGSEGV异常了。

     

    GPR00: 00100100 CE283ED0 CFFDF180 CF528000 C09EA500 EFFEAD20 CF5188A0 00000000

    GPR08: CF5188BC 00200200 36FEF31E D1FD7F9E 22004222 1010DA44 00000290 00000000                                                                     

    GPR16: 1011C858 100147F4 BF9BC9C4 10100000 00000001 C0460000 C06454CC 00000000                                                     

    GPR24: C0640000 CE282000 C0640000 00000005 00000000 00000000 EFFE8EC0 CFFED958

        PowerPC的ABI规定的寄存器的使用规则如下:

      (1)GPR0:属于易失性寄存器,ABI规定普通用户不能使用此寄存器。GCC编译器用此寄存器来保存LR寄存器,Linux PowerPC用此寄存器来传递系统调用号码。

      (2)GPR1:属于专用寄存器,ABI规定用次寄存器来保存堆栈的栈顶指针。

      (3)GPR2:属于专用寄存器,ABI规定普通用户不使用才寄存器,Linux PowerPC用此寄存器来保存当前进程的进程描述符地址。

      (4)GPR3-GPR4:属于易失性寄存器,ABI使用这两个寄存器来保存函数的返回值,或者用来传递参数。

      (5)GPR5-GPR10:也属于易失性寄存器,加上GPR3和GPR4共8个寄存器用来传递函数的参数。当函数的参数超过八个时使用堆栈来传递。

      (6)GPR11-GPR12:属于易失性寄存器,ABI规定普通用户不使用该寄存器,Linux PowerPC有时用这两个寄存器来存放临时变量,但是GCC编译器没有使用这两个寄存器。

      (7)GPR13:属于专用寄存器,ABI规定该寄存器sdata段的基地址指针。Linux PowerPC在系统初始化时使用该寄存器来存放临时变量。GCC有时会根据某些规则将一些常用的数据放入sdata或者sbss段中。应用程序对sdata或者sbss段数据的访问与对data和bss段数据的访问机制不同,访问sdata段的数据速度更快。

      (8)GPR14-GPR31:属于非易失性寄存器。ABI使用这些寄存器来存放一些临时变量,在应用程序中可以自由使用这些变量。

     

    1.4  调用栈分析

    调用链

     

    NIP [C0088B8C] free_block+0xc4/0x16c

    LR [C0088CF8] drain_array+0xc4/0x100

    Call Trace:

    [CE283ED0] [C06ABEC0] 0xc06abec0(unreliable)

    [CE283EF0] [C0088CF8] drain_array+0xc4/0x100

    [CE283F10] [C008A70C] cache_reap+0x94/0x13c

    [CE283F30] [C003DA2C] run_workqueue+0xc4/0x198

    [CE283F60] [C003E6D4] worker_thread+0x130/0x154

    [CE283FB0] [C0042E80] kthread+0xd4/0x110

    [CE283FF0] [C0011A70] original_kernel_thread+0x44/0x60

    Instruction dump:

    5400cffe 0f000000 80c4001c 7d1cf214 3c000010 3d200020 80a8001c 60000100

    81660000 61290200 81460004 3906001c <916a0000> 914b0004 90060000 91260004


    [CE283FB0] [C0042E80] kthread+0xd4/0x110
     

    CE283FB0:栈地址;

    C0042E80:栈上保存的LR值,即函数返回地址。

    kthread:函数名;

    0xd4/0x110:异常指令偏移/调用函数长度。

     

    static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, int node)

     

    从调用栈上看,内核在drain_array中调用free_block出现异常,查看free_block原型,对比入栈参数(CF528000 C09EA500 EFFEAD20 CF5188A0),可以发现int nr_objects, int node明显异常,可能推断调用栈可能已经被踩。

     

    指令码

    Instruction dump:
    5400cffe 0f000000 80c4001c 7d1cf214 3c000010 3d200020 80a8001c 60000100
    81660000 61290200 81460004 3906001c <916a0000> 914b0004 90060000 91260004

            Instruction dump打印出NIP附近的指令字节码。其中<916a0000>为NIP的指令码。

    反汇编定位

    objump -dS vmlinux > /tmp/kernel.s

    通过查找<916a0000>对应的C代码,确定具体那句C代码出现异常。

    其中vmlinux为已打开调试信息的,与故障相同版本的内核镜像。

     

    2.MIPS小系统内核异常分析

     

    2.1  异常打印

    0:Oops[#1]:

      0:Cpu 0

      0:Show thread info from vcpu 0

      0: VCPU   Stack bottom      Task                  Ti at

      0:  0    c000000595057fe0    swapper              c000000595054000

      0:Thread info( c000000595054000 ):

      0:    Process swapper (pid: 1)

      0:  exec_domain ffffffffc0f299b0

      0:  flags 100000

      0:  tp_value 0

      0:  cpu 0

      0:  preempt_count 2

      0:  regs (null)

      0:STACK_END_MAGIC at va( c000000595054068 ): 57AC6E9D( =? 57AC6E9D)

      0:

      0:$ 0   :  0: 0000000000000000  0: 0000000000000000  0: 0000000000000000  0: 0000000000000001  0:

      0:$ 4   :  0: 0000000000000000  0: 0000000000000000  0: ffffffffffffffff  0: 0000000000002976  0:

      0:$ 8   :  0: 0000000000007fff  0: 000000000000000a  0: 5f73746172747570  0: 000000000000006c  0:

      0:$12   :  0: 0000000000000068  0: 000000000000004c  0: ffffffffc10bc384  0: c000000593338000  0:

      0:$16   :  0: 0000000000000000  0: ffffffffc10e42b8  0: ffffffffc10e0000  0: ffffffffc10e0000  0:

      0:$20   :  0: 0000000000000000  0: 0000000000000080  0: 0000000000000080  0: 0000000000000000  0:

      0:$24   :  0: 0000000000000006  0: ffffffffc06501a8  0:                   0:                   0:

      0:$28   :  0: c000000595054000  0: c000000595057c88  0: 0000000000000000  0: ffffffffc087bf40  0:

      0:Hi    : 0000000000000000

      0:Lo    : 0000000000000000 

    0:epc   : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190

      0:    Not tainted

      0:ra    : ffffffffc087bf40 _init+0x3e8/0x480

      0:Status: 5400ffe3      0:KX   0:SX   0:UX   0:KERNEL   0:EXL   0:IE   0:

      0:Cause : 00800008

      0:BadVA : 0000000000000008

      0:PrId  : 000c1102 (XLP316   A2  )

      0:<d>Modules linked in:  0:

      0:Process swapper (pid: 1, threadinfo=c000000595054000, task=c000000595053898, tls=0000000000000000)

      0:Stack :  0: ffffffffffffffff  0: ffffffffc10e0000  0: c000000595193240  0: 0000000000000001  0:

             0: ffffffffc104365c  0: ffffffffc087bf40  0: 000001fac104365c  0: ffffffffc087cb30  0:

             0: ffffffffc087c3a8  0: 0000000000000000  0: ffffffffc0f4a778  0: c000000595193000  0:

             0: c000000595193240  0: 0000000000000001  0: ffffffffc10e0000  0: c000000595193240  0:

             0: 0000000000000001  0: ffffffffc104365c  0: 0000000000000000  0: 0000000000000080  0:

             0: 0000000000000080  0: ffffffffc1043c44  0: 00008a17bc300000  0: ffffffffc10e0000  0:

             0: c00000059333dd40  0: 0000000000000000  0: 3800000000000000  0: 0000000000000000  0:

             0: 000000009333dd40  0: ffffffffc1043638  0: 000000005400ffe0  0: ffffffffbfff00fe  0:

             0: ffffffffc1070000  0: ffffffffc1063200  0: 0000000000000001  0: ffffffffc104365c  0:

             0: 0000000000000000  0: 0000000000000080  0: 0000000000000080  0: 0000000000000000  0:

             0: ...  0:

      0:Call Trace: [jiffies: 0xfffff79f]

      0:[<ffffffffc087c4b4>] _bcore_cleanup+0x34/0x190

      0:[<ffffffffc087bf40>] _init+0x3e8/0x480

      0:[<ffffffffc1043c44>] bcmxgs_init_module+0x5e8/0xc00

      0:[<ffffffffc060eebc>] do_one_initcall+0x3c/0x1a0

      0:[<ffffffffc102cc04>] kernel_init+0x220/0x2b8

      0:[<ffffffffc062c730>] kernel_thread_helper+0x10/0x20

      0:

      0:

    Code:  0: ffbf0028   0: 0000802d   0: 663142b8   0:<dc420008>  0: 0040f809   0: 00000000   0: 0202102a   0: 1040001d   0: 00000000

     0:

      0:<4>Disabling lock debugging due to kernel taint


     
    2.2  异常信号

    异常与信号之间的关系:

    2.3  线程信息分析

    0:Cpu 0:这2个0为当前CPU核ID;

      0:Show thread info from vcpu 0

      0: VCPU   Stack bottom      Task                  Ti at

      0:  0    c000000595057fe0    swapper              c000000595054000

    VCPU:CPU核;

    Stack bottom:栈底指针;

    Task:线程名;

    Ti at:线程thread_info结构体指针;

     0:Thread info( c000000595054000 ):

      0:    Process swapper (pid: 1)

      0:  exec_domain ffffffffc0f299b0

      0:  flags 100000

      0:  tp_value 0

      0:  cpu 0

      0:  preempt_count 2

      0:  regs (null)

      0:STACK_END_MAGIC at va( c000000595054068 ): 57AC6E9D( =? 57AC6E9D) 


    flags :线程标志位,具体标记如下表。此时值为TIF_FIXADE,表示有address errors。
    Thread info( c000000595054000 ):产生异常的线程信息;下面的字段为thread_info结构体中的字段信息。其中,

    preempt_count:为抢占计数。为0时,内核可以安全的执行抢占此线程。不为0,表示当前进程持有锁不能释放CPU控制权(不能被抢占)。

    STACK_END_MAGIC:栈底部的魔幻数,可以辅助判断栈是否被踩。

    #define TIF_SIGPENDING		1	/* signal pending */
    #define TIF_NEED_RESCHED	2	/* rescheduling necessary */
    #define TIF_SYSCALL_AUDIT	3	/* syscall auditing active */
    #define TIF_SECCOMP		4	/* secure computing */
    #define TIF_NOTIFY_RESUME	5	/* callback before returning to user */
    #define TIF_RESTORE_SIGMASK	9	/* restore signal mask in do_signal() */
    #define TIF_USEDFPU		16	/* FPU was used by this task this quantum (SMP) */
    #define TIF_POLLING_NRFLAG	17	/* true if poll_idle() is polling TIF_NEED_RESCHED */
    #define TIF_MEMDIE		18
    #define TIF_FREEZE		19
    #define TIF_FIXADE		20	/* Fix address errors in software */
    #define TIF_LOGADE		21	/* Log address errors to syslog */
    #define TIF_32BIT_REGS		22	/* also implies 16/32 fprs */
    #define TIF_32BIT_ADDR		23	/* 32-bit address space (o32/n32) */
    #define TIF_FPUBOUND		24	/* thread bound to FPU-full CPU set */
    #define TIF_LOAD_WATCH		25	/* If set, load watch registers */
    #define TIF_XKPHYS_MEM_EN	26
    #define TIF_XKPHYS_IO_EN	27
    #define TIF_SYSCALL_TRACE	31	/* syscall trace active */
    

     

    2.4  寄存器分析 

      0:$ 0   :  0: 0000000000000000  0: 0000000000000000  0: 0000000000000000  0: 0000000000000001  0:

      0:$ 4   :  0: 0000000000000000  0: 0000000000000000  0: ffffffffffffffff  0: 0000000000002976  0:

      0:$ 8   :  0: 0000000000007fff  0: 000000000000000a  0: 5f73746172747570  0: 000000000000006c  0:

      0:$12   :  0: 0000000000000068  0: 000000000000004c  0: ffffffffc10bc384  0: c000000593338000  0:

      0:$16   :  0: 0000000000000000  0: ffffffffc10e42b8  0: ffffffffc10e0000  0: ffffffffc10e0000  0:

      0:$20   :  0: 0000000000000000  0: 0000000000000080  0: 0000000000000080  0: 0000000000000000  0:

      0:$24   :  0: 0000000000000006  0: ffffffffc06501a8  0:                   0:                   0:

      0:$28   :  0: c000000595054000  0: c000000595057c88  0: 0000000000000000  0: ffffffffc087bf40  0:

      0:Hi    : 0000000000000000

      0:Lo    : 0000000000000000

      0:epc   : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190

      0:    Not tainted

      0:ra    : ffffffffc087bf40 _init+0x3e8/0x480

      0:Status: 5400ffe3      0:KX   0:SX   0:UX   0:KERNEL   0:EXL   0:IE   0:

      0:Cause : 00800008

      0:BadVA : 0000000000000008

      0:PrId  : 000c1102 (XLP316   A2  )

     

    Mips核心寄存器组有4组,分别是GP, COP0, COP1, COP2。

    其中COP0几个重要的寄存器解释如下:

    Status:c0p0状态cp0_status。其中EXL标示在异常模式中,具体解释请参照《参考资料6.7 第193页》

    Cause:00800008,标示 TLB exception(load or instruction fetch)

    BadVA:产生异常的虚拟地址,如地址错误、无效的TLB,TLB modified等等。

    2.5  调用栈分析

    0:Process swapper (pid: 1, threadinfo=c000000595054000, task=c000000595053898, tls=0000000000000000)

      0:Stack :  0: ffffffffffffffff  0: ffffffffc10e0000  0: c000000595193240  0: 0000000000000001  0:

             0: ffffffffc104365c  0: ffffffffc087bf40  0: 000001fac104365c  0: ffffffffc087cb30  0:

             0: ffffffffc087c3a8  0: 0000000000000000  0: ffffffffc0f4a778  0: c000000595193000  0:

             0: c000000595193240  0: 0000000000000001  0: ffffffffc10e0000  0: c000000595193240  0:

             0: 0000000000000001  0: ffffffffc104365c  0: 0000000000000000  0: 0000000000000080  0:

             0: 0000000000000080  0: ffffffffc1043c44  0: 00008a17bc300000  0: ffffffffc10e0000  0:

             0: c00000059333dd40  0: 0000000000000000  0: 3800000000000000  0: 0000000000000000  0:

             0: 000000009333dd40  0: ffffffffc1043638  0: 000000005400ffe0  0: ffffffffbfff00fe  0:

             0: ffffffffc1070000  0: ffffffffc1063200  0: 0000000000000001  0: ffffffffc104365c  0:

             0: 0000000000000000  0: 0000000000000080  0: 0000000000000080  0: 0000000000000000  0:

             0: ...  0:

      0:Call Trace: [jiffies: 0xfffff79f]

      0:[<ffffffffc087c4b4>] _bcore_cleanup+0x34/0x190

      0:[<ffffffffc087bf40>] _init+0x3e8/0x480

      0:[<ffffffffc1043c44>] bcmxgs_init_module+0x5e8/0xc00

      0:[<ffffffffc060eebc>] do_one_initcall+0x3c/0x1a0

      0:[<ffffffffc102cc04>] kernel_init+0x220/0x2b8

      0:[<ffffffffc062c730>] kernel_thread_helper+0x10/0x20

      0:

      0:

    Code:  0: ffbf0028   0: 0000802d   0: 663142b8   0:<dc420008>  0: 0040f809   0: 00000000   0: 0202102a   0: 1040001d   0: 00000000

     0:


    Call Trace:出现异常线程的调用栈信息。
    Stack:出现异常线程的堆栈信息。

    Code:异常附近的指令码打印。其中0:<dc420008>为epc处的指令码,对应代码位置为(epc   : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190)。具体代码需要反汇编定位。

    反汇编定位方法与Powerpc的相同。

     

    分析代码可知,异常由于访问了BadVA : 0000000000000008的非法地址,查看_bcore_cleanup代码,可知此时bde指针没有初始化,是空指针,所以bde->num_devices的地址刚好是0000000000000008,导致异常。

    异常代码段如下:

    _bcore_cleanup(void)

    {

        for (unit = 0; unit < bde->num_devices(BDE_ALL_DEVICES); unit++)

     

    6.参考资料

    6.1         http://en.wikipedia.org/wiki/Unix_signal

    6.2         http://www.powerlinuxchina.net/club/viewthread.php?tid=981

    6.3         《PowerPC™ e500 Application Binary Interface User’s Guide》

    6.4         《PowerPC™ e500 Core Family Reference Manual》

    6.5         《MPC8572E PowerQUICC™ III Integrated Host Processor Family Reference Manual》

    6.6         《SYSTEM V APPLICATION BINARY INTERFACE – MIPS RISC Processor Supplement》

    6.7         《XLP 300-/300-Lite-Series-Processor Programmer’s Register Reference Guide》

    6.8         http://blog.chinaunix.net/uid-16459552-id-3459993.html

    6.9         http://blog.chinaunix.net/uid-16459552-id-3257539.html

    6.10     http://www.linuxspy.info/2249/tainted-kernel/

     

    --EOF--

    哎...今天够累的,签到来了1...
    回复

    使用道具 举报

    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则

    QQ|手机版|小黑屋|Java自学者论坛 ( 声明:本站文章及资料整理自互联网,用于Java自学者交流学习使用,对资料版权不负任何法律责任,若有侵权请及时联系客服屏蔽删除 )

    GMT+8, 2024-12-22 23:27 , Processed in 0.058558 second(s), 27 queries .

    Powered by Discuz! X3.4

    Copyright © 2001-2021, Tencent Cloud.

    快速回复 返回顶部 返回列表