最近遇到一个程序卡死的问题,借助 gdb 轻松定位,供大家参考。
遇到程序卡死不退处,可能不知道卡死在什么地方,如果程序非常简单,也许 printf 大法就可以很快定位。但是对于大型程序,尤其是一些框架程序,printf 大法可能就力不从心了。
实际的程序很复杂,这里给出一个极简版,一个多线程程序:
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
void* pthread_run1()
{
printf("=== thread 1\n");
while(1)
{
sleep(1);
}
return NULL;
}
void* pthread_run2()
{
printf("=== thread 2\n");
while(1)
{
sleep(1);
return NULL;
}
return NULL;
}
int main()
{
pthread_t tid1;
pthread_t tid2;
pthread_create(&tid1, NULL, pthread_run1, NULL);
pthread_create(&tid2, NULL, pthread_run2, NULL);
pthread_join(tid1, NULL);
pthread_join(tid2, NULL);
return 0;
}
编译(gcc hello.c -g -pthread
)后运行:
xxx:~/code/multithread$ ./a.out
=== thread 1
=== thread 2
程序卡住不退处,当然我这里的例子使用 ctrl-c 信号可以让程序退出,而我实际的程序是这里会卡死。不过这不是重点,重点是怎么知道程序卡死(或卡住)在哪里呢?当然这个简单的例子,你直接 review 代码就能看出,或者简单加几个 printf 就能定位出卡死的位置。上面也说过这个是极简版,review 和 printf 很难发现问题。这个时候我们就可以借助 gdb 了。
首先,查看当前程序的进程号(pid),使用 ps 命令:ps aux | grep a.out
,得到 pid 为 1801781。
xxx 1801781 0.0 0.0 84576 476 pts/1 Sl+ 21:24 0:00 ./a.out
然后,启动 gdb,接着 attach 该 pid:
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 1801781
Attaching to process 1801781
[New LWP 1801782]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__pthread_clockjoin_ex (threadid=140508208416512, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
145 pthread_join_common.c: No such file or directory.
(gdb) up
#1 0x0000564d11e93274 in main () at hello.c:38
38 pthread_join(tid1, NULL);
(gdb)
可以看到程序卡在源码的 38 行。 卡住的原因是线程 join 的时候等不到线程函数返回。
注意事项:
- 如果启动 gdb 后 attach pid 没有权限,比如信息如下,则可以使用 sudo gdb。
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 1801781
Attaching to process 1801781
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
(gdb)
- 如果你的程序不是 -g 编译的,只会看到最底层代码位置,这个时候因为没有调试信息,使用
up
命令也无法显示源码。建议编译 debug 版本来定位,可以获取丰富的信息。
建议大家在调试过程中将 gdb 用起来,gdb 有很多功能值得探索!