众所周知,linux下进程当收到某些信号而终止时,会保存进程上下文信息进入coredump文件中;以帮助程序员们分析调试。它可以通过执行如下命令使能coredump生成。其生成路径定义在/proc/sys/kernel/core_pattern文件中,生成文件名格式定义在内核源码的Documentation/sysctl/kernel.txt。如下我定义生成文件名规则是%e_%P_%t_%s.core,%e是执行文件名(可能被截断),%P是pid,%t是coredump生成时间戳, %s是触发生成coredump的信号值。
ulimit -c unlimited
root@# cat /proc/sys/kernel/core_pattern
/mnt/%e_%P_%t_%s.core
core_pattern:
core_pattern is used to specify a core dumpfile pattern name.
. max length 128 characters; default value is "core"
. core_pattern is used as a pattern template for the output filename;
certain string patterns (beginning with '%') are substituted with
their actual values.
. backward compatibility with core_uses_pid:
If core_pattern does not include "%p" (default does not)
and core_uses_pid is set, then .PID will be appended to
the filename.
. corename format specifiers:
%<NUL> '%' is dropped
%% output one '%'
%p pid
%P global pid (init PID namespace)
%i tid
%I global tid (init PID namespace)
%u uid (in initial user namespace)
%g gid (in initial user namespace)
%d dump mode, matches PR_SET_DUMPABLE and
/proc/sys/fs/suid_dumpable
%s signal number
%t UNIX time of dump
%h hostname
%e executable filename (may be shortened)
%E executable path
%<OTHER> both are dropped
. If the first character of the pattern is a '|', the kernel will treat
the rest of the pattern as a command to run. The core dump will be
written to the standard input of that program instead of to a file.
但是曾遇到过如下情况。如可执行文件名为core,那么生成的coredump文件名中%e按常理也应是core。但是却出现%e替换成奇怪的名称的情况,百思不得其解。后在某次偶然情况下发现这可能与设置的线程名有关,于是编写测试代码如下。
程序执行时传入一个数字[0-3]。选择四条线程中哪条生成coredump。线程执行体中先根据tid生成唯一的名称,设置为自己的线程名。然后判断发现需要生成coredump时,触发生成一下。
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/prctl.h>
#include <vector>
void *thread(void *arg)
{
// 设置线程名称
char tname[256];
snprintf(tname, sizeof(tname), "thread_%#x", (uint32_t)pthread_self());
prctl(PR_SET_NAME, tname);
printf("thread: '%s' run..\n", tname);
bool generalcore = (bool)arg;
if(generalcore){
printf("thread: '%s' general core\n", tname);
*(char *)(arg) = 0x00;
}
return NULL;
}
int main(int argc, char *argv[])
{
if(argc != 2) return -1;
/*
* 传入需要生成coredump的线程id参数
* 在选择的相应线程中触发生成core
*/
int coredump_thread = atoi(argv[1]);
if(coredump_thread < 0 || coredump_thread > 3) return -1;
std::vector<pthread_t> tids;
for(int32_t i = 0; i < 4; i++){
pthread_t newtid;
bool generalcore = (coredump_thread == i)? true: false;
pthread_create(&newtid, NULL, thread, (void *)generalcore);
tids.push_back(newtid);
}
for(uint32_t i = 0; i < tids.size(); i++){
pthread_t tid = tids[i];
pthread_join(tid, NULL);
}
return 0;
}
执行情况如下,可以看到我们选择了线程2生成coredump,其设置的线程名是"thread_0xd2cc4700"。再看看core生成目录下的coredump文件名与我们设置的线程2名称一致,可能细心的你发现有些小差别,即coredump名开头是"thread_0xd2cc47"。
这在先前已经提示过,即%e选项可能会被截断。
root@# ./core 2
thread: 'thread_0xd3cc6700' run..
thread: 'thread_0xd34c5700' run..
thread: 'thread_0xd2cc4700' run..
thread: 'thread_0xd2cc4700' general core
thread: 'thread_0xd24c3700' run..
Segmentation fault (core dumped)
root@# ll /mnt/*.core
-rw------- 1 root root 34172928 11月 22 00:55 /mnt/thread_0xd2cc47_23091_1542819331_11.core
那么我们如何确定这个core就是我们的程序生成的呢?很简单的一个方法是gdb中打开这个core文件,gdb会打印一行coredump由谁生成。
(gdb) core-file /mnt/thread_0xd2cc47_23091_1542819331_11.core
[New LWP 23094]
[New LWP 23091]
[New LWP 23095]
Core was generated by `./core 2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000400ca2 in ?? ()
(gdb)
总结:
1) coredump生成选项%e默认是可执行文件名
2) 当程序中设置了线程名后,%e选项将会格式化为触发进程退出的相应线程的线程名。
3) gdb中使用core-file命令加载core文件,可看到core由谁生成。