记录linux tty的一次软锁排查
本过程参照了某大侠的https://github.com/w-simon/debug/blob/master/tty_lock_cause_sytemd_hung ,
当第二次出现的时候,还是排查了一段时间,所以记录下来很有必要。
出现carsh,敲入命令如下:
crash> bt
PID: 357 TASK: ffff881fd2096780 CPU: 21 COMMAND: "khungtaskd"
#0 [ffff881fcf2d7cb8] machine_kexec at ffffffff81051c5b
#1 [ffff881fcf2d7d18] crash_kexec at ffffffff810f3ec2
#2 [ffff881fcf2d7de8] panic at ffffffff816326d1
#3 [ffff881fcf2d7e68] watchdog at ffffffff8111c8fe
#4 [ffff881fcf2d7ec8] kthread at ffffffff810a661f
#5 [ffff881fcf2d7f50] ret_from_fork at ffffffff81649858
这个是因为1200s(默认值是120s,我们把它改成1200,/proc/sys/kernel/hung_task_timeout_secs)没有调度更新,导致狗进行的panic。
craash中敲入log查看日志:
[935516.524086] INFO: task bash:3857 blocked for more than 1200 seconds.
[935516.524120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[935516.524150] bash D ffff881fffc947c0 0 3857 39198 0x00000082
[935516.524154] ffff883ed26e3cb0 0000000000000046 ffff883ed26e3fd8 ffff883ed26e3fd8
[935516.524160] ffff883ed26e3fd8 00000000000147c0 ffff881fd3170000 ffff883e7f0d3980
[935516.524163] 7fffffffffffffff ffff8803ba0da278 0000000000000001 ffff8803ba0da270
[935516.524167] Call Trace:
[935516.524178] [<ffffffff8163e879>] schedule+0x29/0x70
[935516.524182] [<ffffffff8163c329>] schedule_timeout+0x209/0x2d0
[935516.524188] [<ffffffff810b6a15>] ? check_preempt_curr+0x75/0xa0
[935516.524190] [<ffffffff810b6a7c>] ? ttwu_do_wakeup+0x3c/0xd0
[935516.524193] [<ffffffff810b6d3d>] ? ttwu_do_activate.constprop.85+0x5d/0x70
[935516.524196] [<ffffffff8164061a>] ldsem_down_write+0xea/0x255
[935516.524198] [<ffffffff81640cd8>] tty_ldisc_lock_pair_timeout+0x88/0x120
[935516.524202] [<ffffffff813b8dc4>] tty_ldisc_hangup+0xc4/0x220
[935516.524208] [<ffffffff813b0594>] __tty_hangup+0x344/0x490
[935516.524211] [<ffffffff813b1739>] disassociate_ctty+0x69/0x2e0
[935516.524216] [<ffffffff81081a97>] do_exit+0x777/0xa60
[935516.524221] [<ffffffff81644e1d>] ? __do_page_fault+0x16d/0x450
[935516.524224] [<ffffffff81090931>] ? __set_task_blocked+0x41/0xa0
[935516.524227] [<ffffffff81081dff>] do_group_exit+0x3f/0xa0
[935516.524230] [<ffffffff81081e74>] SyS_exit_group+0x14/0x20
[935516.524233] [<ffffffff81649909>] system_call_fastpath+0x16/0x1b
根据堆栈分析,是因为ldisc的锁拿不到,然后把所有的堆栈打一遍,看谁拿着锁不释放。
>crash bt >>each_bt.txt
然后搜索tty的锁涉及的堆栈:
# grep tty -A 10 -B 5 each_bt.txt
PID: 3857 TASK: ffff883e7f0d3980 CPU: 4 COMMAND: "bash"
#0 [ffff883ed26e3c50] __schedule at ffffffff8163df9b
#1 [ffff883ed26e3cb8] schedule at ffffffff8163e879
#2 [ffff883ed26e3cc8] schedule_timeout at ffffffff8163c329
#3 [ffff883ed26e3d78] ldsem_down_write at ffffffff8164061a
#4 [ffff883ed26e3de0] tty_ldisc_lock_pair_timeout at ffffffff81640cd8
#5 [ffff883ed26e3e10] tty_ldisc_hangup at ffffffff813b8dc4
#6 [ffff883ed26e3e38] __tty_hangup at ffffffff813b0594
#7 [ffff883ed26e3e88] disassociate_ctty at ffffffff813b1739
#8 [ffff883ed26e3eb0] do_exit at ffffffff81081a97
#9 [ffff883ed26e3f40] do_group_exit at ffffffff81081dff
#10 [ffff883ed26e3f70] sys_exit_group at ffffffff81081e74
#11 [ffff883ed26e3f80] system_call_fastpath at ffffffff81649909
RIP: 00007f4eca34f9c9 RSP: 00007ffefaa3c940 RFLAGS: 00010246
RAX: 00000000000000e7 RBX: ffffffff81649909 RCX: 00007f4eca07e940
RDX: 000000000000007f RSI: 0000000000000000 RDI: 000000000000007f
RBP: 00007f4eca647840 R8: 000000000000003c R9: 00000000000000e7
R10: ffffffffffffff80 R11: 0000000000000246 R12: ffffffff81081e74
R13: ffff883ed26e3f78 R14: 0000000000000000 R15: 00007f4eca64ce80
--
RBP: 00007ff4df7fdd08 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00000000000000aa
R13: 0000000000000720 R14: 00000000ffffffee R15: 00000000000000c4
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b
PID: 4362 TASK: ffff881fc62d8b80 CPU: 3 COMMAND: "agetty"
#0 [ffff881fcf817ca0] __schedule at ffffffff8163df9b
#1 [ffff881fcf817d08] schedule at ffffffff8163e879
#2 [ffff881fcf817d18] schedule_timeout at ffffffff8163c295
RIP: 00007f22590e7c30 RSP: 00007ffebd732a78 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff81649909 RCX: ffffffffffffffff
RDX: 0000000000000001 RSI: 00007ffebd732e60 RDI: 0000000000000000
RBP: 00007ffebd732e70 R8: 00000000025bb700 R9: 00000000025bb770
R10: 0000000000000005 R11: 0000000000000246 R12: 00007ffebd7336b0
R13: 000000000000000a R14: 00000000006083a0 R15: 00007ffebd7331c0
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
--
PID: 10910 TASK: ffff883e8a486780 CPU: 46 COMMAND: "ConsoleOutTask"
#0 [ffff883fcf677cb0] __schedule at ffffffff8163df9b
#1 [ffff883fcf677d18] schedule at ffffffff8163e879
#2 [ffff883fcf677d28] schedule_timeout at ffffffff8163c329
#3 [ffff883fcf677dd8] ldsem_down_read at ffffffff816403d8
#4 [ffff883fcf677e48] tty_ldisc_ref_wait at ffffffff813b83a3
#5 [ffff883fcf677e60] tty_write at ffffffff813b0b31
#6 [ffff883fcf677ec0] redirected_tty_write at ffffffff813b0de5
#7 [ffff883fcf677ef8] vfs_write at ffffffff811e0b3d
#8 [ffff883fcf677f38] sys_write at ffffffff811e15df
#9 [ffff883fcf677f80] system_call_fastpath at ffffffff81649909
RIP: 00007fcea73d543d RSP: 00007fcea9cb1d28 RFLAGS: 00000206
RAX: 0000000000000001 RBX: ffffffff81649909 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 00007fcea9cb1d70 RDI: 0000000000000003
RBP: 00007fcea9cb1d60 R8: 0000000001c31070 R9: 0000000000000000
R10: 0000000001c31090 R11: 0000000000000293 R12: 0000000000000003
R13: 00007fcea9cb1d70 R14: 0000000000000083 R15: 0000000000000083
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
--
PID: 20618 TASK: ffff883f89e05c00 CPU: 9 COMMAND: "ConsoleOutTask"
#0 [ffff882114463cb0] __schedule at ffffffff8163df9b
#1 [ffff882114463d18] schedule at ffffffff8163e879
#2 [ffff882114463d28] schedule_timeout at ffffffff8163c329
#3 [ffff882114463dd8] ldsem_down_read at ffffffff816403d8
#4 [ffff882114463e48] tty_ldisc_ref_wait at ffffffff813b83a3
#5 [ffff882114463e60] tty_write at ffffffff813b0b31
#6 [ffff882114463ec0] redirected_tty_write at ffffffff813b0de5
#7 [ffff882114463ef8] vfs_write at ffffffff811e0b3d
#8 [ffff882114463f38] sys_write at ffffffff811e15df
#9 [ffff882114463f80] system_call_fastpath at ffffffff81649909
RIP: 00007f6dfecd643d RSP: 00007f6e01aaed30 RFLAGS: 00000293
RAX: 0000000000000001 RBX: ffffffff81649909 RCX: 00000000006d6f00
RDX: 0000000000000084 RSI: 00007f6e01aaed70 RDI: 0000000000000005
RBP: 00007f6e01aaed60 R8: 00000000006d3c00 R9: 0000000000000000
R10: 00000000006d3c20 R11: 0000000000000293 R12: 0000000000000005
R13: 00007f6e01aaed70 R14: 0000000000000084 R15: 0000000000000084
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
--
RBP: 0000000000000041 R8: 00007f63a2defe70 R9: 00000000000063bf
R10: 00007f64200010c8 R11: 0000000000003293 R12: 000000000d616e40
R13: 00007f63a2df0700 R14: 000000000d616e41 R15: 0000000000000000
ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b
PID: 25548 TASK: ffff883fcefa8b80 CPU: 37 COMMAND: "agetty"
#0 [ffff88213fa1bca0] __schedule at ffffffff8163df9b
#1 [ffff88213fa1bd08] schedule at ffffffff8163e879
#2 [ffff88213fa1bd18] schedule_timeout at ffffffff8163c329
RIP: 00007f8d31776c30 RSP: 00007fff6f7c0a88 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff81649909 RCX: ffffffffffffffff
RDX: 0000000000000001 RSI: 00007fff6f7c0e70 RDI: 0000000000000000
RBP: 00007fff6f7c0e80 R8: 00000000022a4700 R9: 00000000022a4770
R10: 00007fff6f7c0850 R11: 0000000000000246 R12: 00007fff6f7c16c0
R13: 0000000000000012 R14: 00000000006083a0 R15: 00007fff6f7c11d0
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
走查相关的代码,发现是10910占着锁不放。
反汇编相关的代码,需要将tty找出来,
crash> dis tty_ldisc_ref_wait
0xffffffff813b8380 <tty_ldisc_ref_wait>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff813b8385 <tty_ldisc_ref_wait+5>: push %rbp
0xffffffff813b8386 <tty_ldisc_ref_wait+6>: movabs $0x7fffffffffffffff,%rsi
0xffffffff813b8390 <tty_ldisc_ref_wait+16>: mov %rsp,%rbp
0xffffffff813b8393 <tty_ldisc_ref_wait+19>: push %rbx
0xffffffff813b8394 <tty_ldisc_ref_wait+20>: mov %rdi,%rbx
0xffffffff813b8397 <tty_ldisc_ref_wait+23>: lea 0x270(%rdi),%rdi------------------------rdi+0x270保存了tty指针,这步比较耗时,因为通过寄存器或者堆栈找变量需要一定的汇编基础和经验。
0xffffffff813b839e <tty_ldisc_ref_wait+30>: callq 0xffffffff816402b0 <ldsem_down_read>
通过查看20618的堆栈,找对关联tty的file指针:
crash> files 20618 | grep console
3 ffff880217011700 ffff883fd100a900 ffff883fd0b32038 CHR /dev/console
5 ffff8801acf06b00 ffff883fd100a900 ffff883fd0b32038 CHR /dev/console
crash> struct file.private_data ffff880217011700
private_data = 0xffff882146f23180
crash> struct tty_file_private.tty 0xffff882146f23180
tty = 0xffff8803ba0da000
crash> struct tty_struct.disc_data 0xffff8803ba0da000
disc_data = 0xffff8803ba0dd800
crash> struct n_tty_data.icanon 0xffff8803ba0dd800
icanon = 1 '\001'
发现属性被设置为了1,而根据PID: 10910 TASK: ffff883e8a486780 CPU: 46 COMMAND: "ConsoleOutTask"的用户态代码,走查发现,我们通过设置VTIME
属性来保证超时,
tcgetattr(s_pm,&term);
term.c_cc[VMIN] = 1;-------------最少读一个字符
term.c_cc[VTIME] = 2;-----------200ms超时
tcsetattr(s_pm,TCSANOW,&term);
根据tty的属性,这个代码要生效,前提是icanon 属性不能被设置为1,但我们的代码逻辑是:
/* 设置伪终端模式,非规范,无回显 */
tcgetattr(slavefd, &term);
term.c_lflag = term.c_lflag & (~(ICANON));
term.c_lflag = term.c_lflag & (~(ISIG));
term.c_cc[VMIN] = 1;
term.c_cc[VTIME] = 2;
第一次设置完ICANON属性之后,就不管了,那么中途如果有人修改了这个属性,则会导致read阻塞,也就是ConsoleOutTask阻塞。
那么属性被修改,有两种情况,一种是被异常修改,一种是正常,我们通过走查内核代码,决定在n_tty_set_termios函数中打点:
probe kernel.function("n_tty_set_termios").return
{
if (kernel_string($tty->name) =~"tty*")---------------tty1到ttyn都能记录
printf ("pid=%d,ICANON=%d,ttyname=%s,cmd=%s\r\n", pid(),($tty->termios->c_lflag)&0x0000002,kernel_string($tty->name), execname())
}
这样只要修改tty属性,我们都能记录下来。
然后是漫长的复现过程,好在经过几天的加压,复现出来了。
pid=1,ICANON=2,ttyname=tty2,cmd=systemd
pid=1,ICANON=2,ttyname=tty2,cmd=systemd
pid=1,ICANON=2,ttyname=tty2,cmd=systemd
pid=1,ICANON=2,ttyname=tty2,cmd=systemd
pid=2019,ICANON=2,ttyname=tty2,cmd=systemd-logind
pid=2019,ICANON=2,ttyname=tty4,cmd=systemd-logind
pid=2019,ICANON=2,ttyname=tty5,cmd=systemd-logind
pid=11250,ICANON=0,ttyname=tty3,cmd=bash
pid=1,ICANON=2,ttyname=tty3,cmd=systemd
pid=1,ICANON=2,ttyname=tty3,cmd=systemd
pid=32642,ICANON=2,ttyname=tty2,cmd=(agetty)
pid=32642,ICANON=2,ttyname=tty2,cmd=(agetty)
pid=32642,ICANON=2,ttyname=tty2,cmd=(agetty)
pid=32642,ICANON=2,ttyname=tty2,cmd=(agetty)
确实会有进程去修改ICANON属性为1。
我们操作tty使用如下方式:
s_ConsoleOutFd = open("/dev/console",O_RDWR,0);
由于没有设置非阻塞模式,导致了这个问题。