[skill][gdb] gdb 多线程调试

 

中文快速入门:

http://coolshell.cn/articles/3643.html (关于多线程的部署说的并不太对)

 

进阶:

多进程相关概念:

inferiors 是什么?

  http://moss.cs.iit.edu/cs351/gdb-inferiors.html

多线程怎么调试:

  分 all-stop 和 non-stop 两个模式。

  all-stop 模式下,一个断点。所以线程全部终止运行。

  使用 set non-stop on命令可以进入non-stop模式。其他线程不会受到一个线程停止的影响。

例如:non-stop模式下设置了一个中断: 其他线程正常运行。

(gdb) info thread
  Id   Target Id         Frame 
  5    Thread 0x7fffb45fc700 (LWP 10101) "lcore-slave-7" (running)
* 4    Thread 0x7fffb4dfd700 (LWP 10100) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  3    Thread 0x7fffb55fe700 (LWP 10099) "lcore-slave-3" (running)
  2    Thread 0x7fffb5dff700 (LWP 10098) "eal-intr-thread" (running)
  1    Thread 0x7ffff7fef8c0 (LWP 10097) "l3fwd" (running)
(gdb) 

例如:all-stop模式下,scheduler-locking off 时:一个线程中断,所有线程都中断

Breakpoint 6, lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
177                             nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
(gdb) info thread
  Id   Target Id         Frame 
* 5    Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  4    Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  3    Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  2    Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
  1    Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
(gdb) show non-stop
Controlling the inferior in non-stop mode is off.
(gdb) show scheduler-locking
Mode for locking scheduler during execution is "off".
(gdb) 
(gdb) c
Continuing.
[Switching to Thread 0x7fffb4dfd700 (LWP 10107)]

Breakpoint 6, lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
177                             nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
(gdb) info thread            
 Id   Target Id         Frame  
 5    Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
* 4    Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
 3    Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
 2    Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
 1    Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
(gdb) c
Continuing.

 

scheduler-locking 等于 on时。线程的调试,单步执行。其他线程都不运行改变其当前执行位置。

(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fef8c0 (LWP 10104))]
#0  lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
177                             nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
(gdb) info thread
  Id   Target Id         Frame 
  5    Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:179
  4    Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:156
  3    Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  2    Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
(gdb) n
179                             if (nb_rx == 0)
(gdb) n
180                                     continue;
(gdb) n
174                     for (i = 0; i < qconf->n_rx_queue; ++i) {
(gdb) info thread
  Id   Target Id         Frame 
  5    Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:179
  4    Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:156
  3    Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  2    Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:174
(gdb) 

当 scheduler-locking 切换成 off,当进程执行一个next,其他进程就都执行了,并且被中断到其他线程里。承接上图代码:

如果每一个线程里都有中断,这种情况下,完全无法进行单步调试。

(gdb) set scheduler-locking off
(gdb) info thread
  Id   Target Id         Frame 
  5    Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:179
  4    Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:156
  3    Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  2    Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:174
(gdb) n
[Switching to Thread 0x7fffb4dfd700 (LWP 10107)]

Breakpoint 6, lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
177                             nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
(gdb) info thread
  Id   Target Id         Frame 
  5    Thread 0x7fffb45fc700 (LWP 10108) "lcore-slave-7" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
* 4    Thread 0x7fffb4dfd700 (LWP 10107) "lcore-slave-6" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  3    Thread 0x7fffb55fe700 (LWP 10106) "lcore-slave-3" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:177
  2    Thread 0x7fffb5dff700 (LWP 10105) "eal-intr-thread" 0x00007ffff71e62c3 in epoll_wait () from /lib64/libc.so.6
  1    Thread 0x7ffff7fef8c0 (LWP 10104) "l3fwd" lpm_main_loop (dummy=0x0) at /root/src/sdk/@dpdk/dpdk-stable-16.07.1/examples/l3fwd/l3fwd_lpm.c:174
(gdb) 

 

还有一个step模式。含义是:当用"step"命令调试线程时,其它线程不会执行,但是用其它命令(比如"next")调试线程时,其它线程也许会执行。

应该是还有一些特殊情况的,不过含义既如此!

  https://sourceware.org/gdb/current/onlinedocs/gdb/Thread-Stops.html#Thread-Stops

总结一下:

  0. set non-stop on 之前需要设置 set pagination off

  1. 对整个进程全局调试,即哪里有中断就断到哪里。断的时候,整个进程停止执行。

    此时,使用默认设置: non-step = off  scheduler-locking = off

  2. 需要专心调试一个线程,其他线程保持正常运行状态。

    使用 non-step = on

  3. 需要专心调试一个线程,其他线程保持停止状态,并且不影响当前线程。

    使用 non-stop = off  scheduler-locking = on

  4. 还没遇到我需要这个场景的时候

    使用 non-stop = off  scheduler-locking = step

 

一个真实的多线程调试例子:

1。 程序正常启动了 (在这和gdb还没有关系)

[root@dpdk build]# ./l3fwd -l2,3,6,7 -- -p3 --config "(0,0,2),(1,0,6)" --parse-ptype
EAL: Detected 8 lcore(s)
EAL: Probing VFIO support...
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
... ...

2. 启动gdb,并设一设,然后后台attach。

  TODO: 并不知道,attach出来的进程如何non-stop。且该进程也不支持 attach &

 

posted on 2016-12-28 19:32  toong  阅读(535)  评论(0编辑  收藏  举报