Java自学者论坛

 找回密码
 立即注册

手机号码,快捷登录

恭喜Java自学者论坛(https://www.javazxz.com)已经为数万Java学习者服务超过8年了!积累会员资料超过10000G+
成为本站VIP会员,下载本站10000G+会员资源,会员资料板块,购买链接:点击进入购买VIP会员

JAVA高级面试进阶训练营视频教程

Java架构师系统进阶VIP课程

分布式高可用全栈开发微服务教程Go语言视频零基础入门到精通Java架构师3期(课件+源码)
Java开发全终端实战租房项目视频教程SpringBoot2.X入门到高级使用教程大数据培训第六期全套视频教程深度学习(CNN RNN GAN)算法原理Java亿级流量电商系统视频教程
互联网架构师视频教程年薪50万Spark2.0从入门到精通年薪50万!人工智能学习路线教程年薪50万大数据入门到精通学习路线年薪50万机器学习入门到精通教程
仿小米商城类app和小程序视频教程深度学习数据分析基础到实战最新黑马javaEE2.1就业课程从 0到JVM实战高手教程MySQL入门到精通教程
查看: 5150|回复: 0

Linux由于物理节点故障导致的异常重启-Case1

[复制链接]
  • TA的每日心情
    奋斗
    2024-11-24 15:47
  • 签到天数: 804 天

    [LV.10]以坛为家III

    2053

    主题

    2111

    帖子

    72万

    积分

    管理员

    Rank: 9Rank: 9Rank: 9

    积分
    726782
    发表于 2021-4-11 14:25:31 | 显示全部楼层 |阅读模式

     

    问题描述:Linux VM异常重启,需要排查问题原因

     

    排查结果:

     

    查询Messages日志获取到的信息

     

    虚拟机内核版本:

    Jun 20 03:34:51 test01 kernel: Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 31 21:57:07 UTC 2016

     

    查看到虚拟机重启时间约为:2019/6/20 03:34 CST

    Jun 20 03:31:04 test01 kernel: hv_utils: Shutdown request received - graceful shutdown initiated
    Jun 20 03:31:04 test01 init: tty (/dev/tty1) main process (2703) killed by TERM signal
    Jun 20 03:31:04 test01 init: tty (/dev/tty2) main process (2705) killed by TERM signal
    Jun 20 03:31:04 test01 init: tty (/dev/tty3) main process (2707) killed by TERM signal
    Jun 20 03:31:04 test01 init: tty (/dev/tty4) main process (2709) killed by TERM signal
    Jun 20 03:31:04 test01 init: tty (/dev/tty5) main process (2711) killed by TERM signal
    Jun 20 03:31:04 test01 init: tty (/dev/tty6) main process (2713) killed by TERM signal
    Jun 20 03:31:04 test01 init: serial (ttyS0) main process (2723) killed by TERM signal
    Jun 20 03:31:05 test01 abrtd: Got signal 15, exiting
    Jun 20 03:31:10 test01 dnsmasq[1710]: exiting on receipt of SIGTERM
    Jun 20 03:31:13 test01 acpid: exiting
    Jun 20 03:31:13 test01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
    Jun 20 03:31:13 test01 init: Disconnected from system bus
    Jun 20 03:31:14 test01 auditd[1424]: The audit daemon is exiting.
    Jun 20 03:31:14 test01 kernel: type=1305 audit(1560972674.940:131977): audit_pid=0 old=1424 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
    Jun 20 03:31:15 test01 kernel: type=1305 audit(1560972675.039:131978): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditctl_t:s0 res=1
    Jun 20 03:31:15 test01 kernel: Kernel logging (proc) stopped.
    Jun 20 03:31:15 test01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1458" x-info="http://www.rsyslog.com"] exiting on signal 15.
    Jun 20 03:34:51 test01 kernel: imklog 5.8.10, log source = /proc/kmsg started. Jun 20 03:34:51 test01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1457" x-info="http://www.rsyslog.com"] start Jun 20 03:34:51 test01 kernel: Initializing cgroup subsys cpuset Jun 20 03:34:51 test01 kernel: Initializing cgroup subsys cpu
    Jun 20 03:34:51 test01 kernel: Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 31 21:57:07 UTC 2016
    Jun 20 03:34:51 test01 kernel: Command line: ro root=UUID=adc76f7c-fef6-4075-941e-e7ce50fb3e50 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 rd_NO_LVM rd_NO_DM
    Jun 20 03:34:51 test01 kernel: KERNEL supported cpus:
    Jun 20 03:34:51 test01 kernel:  Intel GenuineIntel
    Jun 20 03:34:51 test01 kernel:  AMD AuthenticAMD
    Jun 20 03:34:51 test01 kernel:  Centaur CentaurHauls

     

    查询到虚拟机重启之前出现了Kernel Panic的错误,提示:INFO: task jbd2/sda1-8:540 blocked for more than 120 seconds. 备注:jdb2进程是一个文件系统的写journal的进程

    Jun 19 00:27:01 test01 auditd[1424]: Audit daemon rotating log files
    Jun 20 03:28:27 test01 kernel: INFO: task jbd2/sda1-8:540 blocked for more than 120 seconds.
    Jun 20 03:28:27 test01 kernel:      Not tainted 2.6.32-642.1.1.el6.x86_64 #1
    Jun 20 03:28:27 test01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Jun 20 03:28:27 test01 kernel: jbd2/sda1-8   D 0000000000000000     0   540      2 0x00000000
    Jun 20 03:28:27 test01 kernel: ffff880433257b30 0000000000000046 0000000000000000 ffff880433336d80
    Jun 20 03:28:27 test01 kernel: 000000002a5c72b4 ffffffffa307745d ffff880433257ae0 ffff880433257ad0
    Jun 20 03:28:27 test01 kernel: ffffffffa0045958 0000000000000000 ffff8804316505f8 ffff880433257fd8
    Jun 20 03:28:27 test01 kernel: Call Trace:
    Jun 20 03:28:27 test01 kernel: [<ffffffffa0045958>] ? read_hv_clock_tsc+0x38/0x80 [hv_vmbus]
    Jun 20 03:28:27 test01 kernel: [<ffffffff8112e610>] ? sync_page+0x0/0x50
    Jun 20 03:28:27 test01 kernel: [<ffffffff81547db3>] io_schedule+0x73/0xc0
    Jun 20 03:28:27 test01 kernel: [<ffffffff8112e64d>] sync_page+0x3d/0x50
    Jun 20 03:28:27 test01 kernel: [<ffffffff8154889f>] __wait_on_bit+0x5f/0x90
    Jun 20 03:28:27 test01 kernel: [<ffffffff8112e883>] wait_on_page_bit+0x73/0x80
    Jun 20 03:28:27 test01 kernel: [<ffffffff810a6b40>] ? wake_bit_function+0x0/0x50
    Jun 20 03:28:27 test01 kernel: [<ffffffff811449c5>] ? pagevec_lookup_tag+0x25/0x40
    Jun 20 03:28:27 test01 kernel: [<ffffffff8112ecab>] wait_on_page_writeback_range+0xfb/0x190
    Jun 20 03:28:27 test01 kernel: [<ffffffff8112ed6f>] filemap_fdatawait+0x2f/0x40
    Jun 20 03:28:27 test01 kernel: [<ffffffffa00bfe59>] jbd2_journal_commit_transaction+0x7e9/0x14f0 [jbd2]
    Jun 20 03:28:27 test01 kernel: [<ffffffff8108fd4b>] ? try_to_del_timer_sync+0x7b/0xe0
    Jun 20 03:28:27 test01 kernel: [<ffffffffa00c5a38>] kjournald2+0xb8/0x220 [jbd2]
    Jun 20 03:28:27 test01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
    Jun 20 03:28:27 test01 kernel: [<ffffffffa00c5980>] ? kjournald2+0x0/0x220 [jbd2]
    Jun 20 03:28:27 test01 kernel: [<ffffffff810a662e>] kthread+0x9e/0xc0
    Jun 20 03:28:27 test01 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
    Jun 20 03:28:27 test01 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0
    Jun 20 03:28:27 test01 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

     

    查询了secure日志,仅仅发现有关机信号的记录,没有更加详细的信息了  备注:SIGTERM 15 A 终止信号 

    Jun 20 03:31:05 test01 sshd[1811]: Received signal 15; terminating.
    Jun 20 03:34:59 test01 sshd[1808]: Server listening on 0.0.0.0 port 22.
    Jun 20 03:34:59 test01 sshd[1808]: Server listening on :: port 22.

     

    查询了Linux问题虚拟机所在的物理节点,发现早在6/20 03:10 CST左右,物理节点已经出现了系统故障,虚拟机被自动迁移至其他可用节点,此过程造成了虚拟机的自动重启

     

    总结:在某些情况下,服务器系统日志出现了call trace的报错,也不一定是操作系统层面引起的问题,还可能是物理节点(针对虚拟机)或硬件故障。

     

    哎...今天够累的,签到来了1...
    回复

    使用道具 举报

    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则

    QQ|手机版|小黑屋|Java自学者论坛 ( 声明:本站文章及资料整理自互联网,用于Java自学者交流学习使用,对资料版权不负任何法律责任,若有侵权请及时联系客服屏蔽删除 )

    GMT+8, 2024-12-23 00:50 , Processed in 0.071685 second(s), 29 queries .

    Powered by Discuz! X3.4

    Copyright © 2001-2021, Tencent Cloud.

    快速回复 返回顶部 返回列表