[skill][gdb] gdb 多线程调试
中文快速入门:
http://coolshell.cn/articles/3643.html (关于多线程的部署说的并不太对)
进阶:
多进程相关概念:
inferiors 是什么?
http://moss.cs.iit.edu/cs351/gdb-inferiors.html
多线程怎么调试:
分 all-stop 和 non-stop 两个模式。
all-stop 模式下,一个断点。所以线程全部终止运行。
使用 set non-stop on命令可以进入non-stop模式。其他线程不会受到一个线程停止的影响。
例如:non-stop模式下设置了一个中断: 其他线程正常运行。
(gdb) info thread Id Target Id Frame 5 Thread 0x7fffb45fc700 (LWP 10101) "lcore-slave-7" (running) * 4 Thread 0x7fffb4dfd700 (LWP 10100) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 3 Thread 0x7fffb55fe700 (LWP 10099) "lcore-slave-3" (running) 2 Thread 0x7fffb5dff700 (LWP 10098) "eal-intr-thread" (running) 1 Thread 0x7ffff7fef8c0 (LWP 10097) "l3fwd" (running) (gdb)
例如:all-stop模式下,scheduler-locking off 时:一个线程中断,所有线程都中断
Breakpoint 6, lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 177 nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, (gdb) info thread Id Target Id Frame * 5 Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 4 Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 3 Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 2 Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6 1 Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 (gdb) show non-stop Controlling the inferior in non-stop mode is off. (gdb) show scheduler-locking Mode for locking scheduler during execution is "off". (gdb)
(gdb) c
Continuing.
[Switching to Thread 0x7fffb4dfd700 (LWP 10107)]
Breakpoint 6, lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
177 nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
(gdb) info thread
Id Target Id Frame
5 Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
* 4 Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
3 Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
2 Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
1 Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
(gdb) c
Continuing.
scheduler-locking 等于 on时。线程的调试,单步执行。其他线程都不运行改变其当前执行位置。
(gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fef8c0 (LWP 10104))] #0 lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 177 nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, (gdb) info thread Id Target Id Frame 5 Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:179 4 Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:156 3 Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 2 Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 (gdb) n 179 if (nb_rx == 0) (gdb) n 180 continue; (gdb) n 174 for (i = 0; i < qconf->n_rx_queue; ++i) { (gdb) info thread Id Target Id Frame 5 Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:179 4 Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:156 3 Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 2 Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:174 (gdb)
当 scheduler-locking 切换成 off,当进程执行一个next,其他进程就都执行了,并且被中断到其他线程里。承接上图代码:
如果每一个线程里都有中断,这种情况下,完全无法进行单步调试。
(gdb) set scheduler-locking off (gdb) info thread Id Target Id Frame 5 Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:179 4 Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:156 3 Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 2 Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:174 (gdb) n [Switching to Thread 0x7fffb4dfd700 (LWP 10107)] Breakpoint 6, lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 177 nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, (gdb) info thread Id Target Id Frame 5 Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 * 4 Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 3 Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177 2 Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6 1 Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:174 (gdb)
还有一个step
模式。含义是:当用"step
"命令调试线程时,其它线程不会执行,但是用其它命令(比如"next
")调试线程时,其它线程也许会执行。
应该是还有一些特殊情况的,不过含义既如此!
https://sourceware.org/gdb/current/onlinedocs/gdb/Thread-Stops.html#Thread-Stops
总结一下:
0. set non-stop on 之前需要设置 set pagination off
1. 对整个进程全局调试,即哪里有中断就断到哪里。断的时候,整个进程停止执行。
此时,使用默认设置: non-step = off scheduler-locking = off
2. 需要专心调试一个线程,其他线程保持正常运行状态。
使用 non-step = on
3. 需要专心调试一个线程,其他线程保持停止状态,并且不影响当前线程。
使用 non-stop = off scheduler-locking = on
4. 还没遇到我需要这个场景的时候
使用 non-stop = off scheduler-locking = step
一个真实的多线程调试例子:
1。 程序正常启动了 (在这和gdb还没有关系)
[root@dpdk build]# ./l3fwd -l2,3,6,7 -- -p3 --config "(0,0,2),(1,0,6)" --parse-ptype EAL: Detected 8 lcore(s) EAL: Probing VFIO support... EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! ... ...
2. 启动gdb,并设一设,然后后台attach。
TODO: 并不知道,attach出来的进程如何non-stop。且该进程也不支持 attach &