linux下生成coredump文件
一、coredump文件
这种文件通俗的说法叫进程转储,其中比较洋气的“转储”就是dump的翻译,这个词在计算机中用的比较多,所以建议大家多用,显得比较专业。在windows下可以通过MiniDumpWriteDump这个API来直接生成一个运行进程的转储文件。这个API的名字集成了windows API接口命名的一贯风格,那就是很长很到位,生怕你不知道这个函数是啥意思,有种学究的风格。我对这个接口印象深刻,是因为这个功能是我的入职导师最早在我们工程中使用这个接口的,用来生成转储文件,该接口对于之后版本的稳定性有着非常重要的意义,它也让人体会到了维护人员的重要性。一个接口可以避免多少偶发问题而引入的黑暗和猜测,让你体会到debug的乐趣。它的优点在于它可以由进程来调用这个接口来转储自己的当前内存映像,相当于现在比较流行的“自拍”。
但是这些都不是重点。现在的问题是linux是是否有这个功能?如何实现?
二、Linux内核何时产生coredump文件
这个文件的产生是有内核来代劳的。内核的思维逻辑也很简单:内核检测到了用户态的进程出现了不可挽回的操作,并且用户态也没有关心这个错误,或者用户的程序无能为力关注这个错误,此时内核就仗义出手,代劳为这个消亡的进程举行简单的临终仪式,例如记录了它临终时的身体状况,分析退出原因并作为记录,这个就是coredump文件,这个过程就叫做coredump过程,有些有爱的程序员给出了一个很中国风的翻译“吐核”。
这里的假设很合理,也没有什么问题,但是和windows的接口相比,他有一个问题,就是用户不能控制coredump的发生时机。比方说,一个进程在运行的非常健康的时候,它就想执行一次自己的coredump过程(您要问什么情况下这么做,我也没想好?比方说定期生成一个检测文件作为纪念?)。那么在linux下是否能够实现呢?
在内核中,生成转储文件的代码在linux-2.6.37.1\kernel\signal.c文件中get_signal_to_deliver函数内,
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(regs, info->si_signo);
/*
* If it was able to dump core, this kills all
* other threads in the group and synchronizes with
* their demise. If we lost the race with another
* thread getting here, it set group_exit_code
* first and our do_group_exit call below will use
* that value and ignore the one we pass it.
*/
do_coredump(info->si_signo, info->si_signo, regs);整个内核中只有这一个地方调用了转储接口,也就是这里是内核中生成coredump的唯一接口。
}
/*
* Death signals, no core dump.
*/
do_group_exit(info->si_signo);线程组的退出和coredump是连体的,瓜儿连着藤 藤儿牵着瓜,也可以认为是一个霸王条款,coredump和线程组退出是绑定在一起的,要想吐核,必须付出生命代价。
总之,如果使用内核的coredump功能,那么转储之后线程组将会随之消失,所以不是可持续发展方案。这里也说明了用户态自己产生自己coredump文件的唯一方法,那就是取消自己某个可以产生coredump文件的信号处理函数,然后自己给自己发送这个信号,相当于壮烈自杀。
三、gdb如何生成coredump文件
有些同学对gdb可能比较熟悉,gdb有一条可持续吐核指令,那就是gcore命令,它可以用来随时随地产生一个coredump文件,这核人家吐的还是很潇洒的,不至于弄得头破血流,悲壮如斯。gdb是一个运行在用户态的程序,它同样不可能超越内核来调用内核的coredump接口,那么gdb是如何实现的呢?
看了一下gdb的代码,发现gdb真实一个劳动模范,甚至有些“打肿脸充胖子”的嫌疑,就是郭冬临小品里那句“有事儿您说话”。你想吐核,我又没办法超越内核给你吐一个,那我就自己造一个山寨的。所以gdb就不辞劳苦,四处搜集被调试进程的状态,然后自己装的小大人儿一样,把所有拼接的信息组装成一个coredump文件,然后高高兴兴的返回,虽然累得满头大汗,反正我是觉得这个过程还是挺麻烦的,至少从实现上看如此。
这里gdb的代码就不分析了,除了繁琐没有别的,因为它要支持的操作系统和处理器组合太多了。但是要注意的是多线程问题。一个线程最为核心和私有的数据一个是堆栈、一个是寄存器组。堆栈虽然从逻辑上是线程私有,但是它依然是在整个进程的地址空间中,所以把整个进程的地址空间转储到文件中就啥都有了。而对于每个进程的寄存器组,这个就需要调试器动用自己调试器的特殊身份,通过ptrace的 PTRACE_GETREGSET请求来获得被调试线程的所有寄存器。而其它的线程枚举对调试器来说更是小菜一碟。
还有一个就是coredump文件如何和内核生成的结构一致。这个是通过系统自带的一个头文件中定义的结构来实现的。我预处理看了一下,大概是在系统文件夹中(注意,不在gdb中,也不在C库中)/usr/include/sys/procfs.h文件中。有些同学会问,你怎么从这么个猥琐的地方找到这个冷僻的定义的,恩,因为我看gdb预处理输出了。由于这篇文章比较瘦弱,所以把这个文件整个提出来增增肥:
[tsecer@Harry searchorder]$ cat /usr/include/sys/procfs.h
/* Copyright (C) 1996, 1997, 1999, 2000 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
#ifndef _SYS_PROCFS_H
#define _SYS_PROCFS_H 1
/* This is somewhat modelled after the file of the same name on SVR4
systems. It provides a definition of the core file format for ELF
used on Linux. It doesn't have anything to do with the /proc file
system, even though Linux has one.
Anyway, the whole purpose of this file is for GDB and GDB only.
Don't read too much into it. Don't use it for anything other than
GDB unless you know what you are doing. */
#include <features.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/user.h>
__BEGIN_DECLS
/* Type for a general-purpose register. */
typedef unsigned long elf_greg_t;
/* And the whole bunch of them. We could have used `struct
user_regs_struct' directly in the typedef, but tradition says that
the register set is an array, which does have some peculiar
semantics, so leave it that way. */
#define ELF_NGREG (sizeof (struct user_regs_struct) / sizeof(elf_greg_t))
typedef elf_greg_t elf_gregset_t[ELF_NGREG];
/* Register set for the floating-point registers. */
typedef struct user_fpregs_struct elf_fpregset_t;
/* Register set for the extended floating-point registers. Includes
the Pentium III SSE registers in addition to the classic
floating-point stuff. */
typedef struct user_fpxregs_struct elf_fpxregset_t;
/* Signal info. */
struct elf_siginfo
{
int si_signo; /* Signal number. */
int si_code; /* Extra code. */
int si_errno; /* Errno. */
};
/* Definitions to generate Intel SVR4-like core files. These mostly
have the same names as the SVR4 types with "elf_" tacked on the
front to prevent clashes with Linux definitions, and the typedef
forms have been avoided. This is mostly like the SVR4 structure,
but more Linuxy, with things that Linux does not support and which
GDB doesn't really use excluded. */
struct elf_prstatus 每个线程有一个这样的结构在coredump文件中。
{
struct elf_siginfo pr_info; /* Info associated with signal. */
short int pr_cursig; /* Current signal. */
unsigned long int pr_sigpend; /* Set of pending signals. */
unsigned long int pr_sighold; /* Set of held signals. */
__pid_t pr_pid;
__pid_t pr_ppid;
__pid_t pr_pgrp;
__pid_t pr_sid;
struct timeval pr_utime; /* User time. */
struct timeval pr_stime; /* System time. */
struct timeval pr_cutime; /* Cumulative user time. */
struct timeval pr_cstime; /* Cumulative system time. */
elf_gregset_t pr_reg; /* GP registers. */
int pr_fpvalid; /* True if math copro being used. */
};
#define ELF_PRARGSZ (80) /* Number of chars for args. */
struct elf_prpsinfo 整个线程组有一个这样的结构在coredump文件中。
{
char pr_state; /* Numeric process state. */
char pr_sname; /* Char for pr_state. */
char pr_zomb; /* Zombie. */
char pr_nice; /* Nice val. */
unsigned long int pr_flag; /* Flags. */
unsigned short int pr_uid;
unsigned short int pr_gid;
int pr_pid, pr_ppid, pr_pgrp, pr_sid;
/* Lots missing */
char pr_fname[16]; /* Filename of executable. */
char pr_psargs[ELF_PRARGSZ]; /* Initial part of arg list. */
};
/* The rest of this file provides the types for emulation of the
Solaris <proc_service.h> interfaces that should be implemented by
users of libthread_db. */
/* Addresses. */
typedef void *psaddr_t;
/* Register sets. Linux has different names. */
typedef elf_gregset_t prgregset_t;
typedef elf_fpregset_t prfpregset_t;
/* We don't have any differences between processes and threads,
therefore have only one PID type. */
typedef __pid_t lwpid_t;
/* Process status and info. In the end we do provide typedefs for them. */
typedef struct elf_prstatus prstatus_t;
typedef struct elf_prpsinfo prpsinfo_t;
__END_DECLS
#endif /* sys/procfs.h */
[tsecer@Harry searchorder]$
四、结论
至少在linux2.6.37及之前版本(之后没看,估计也没有),用户态没有简单接口来实现coredump之后让进程全身而退,除非自己手动实现一个coredump格式文件的创建。
这种文件通俗的说法叫进程转储,其中比较洋气的“转储”就是dump的翻译,这个词在计算机中用的比较多,所以建议大家多用,显得比较专业。在windows下可以通过MiniDumpWriteDump这个API来直接生成一个运行进程的转储文件。这个API的名字集成了windows API接口命名的一贯风格,那就是很长很到位,生怕你不知道这个函数是啥意思,有种学究的风格。我对这个接口印象深刻,是因为这个功能是我的入职导师最早在我们工程中使用这个接口的,用来生成转储文件,该接口对于之后版本的稳定性有着非常重要的意义,它也让人体会到了维护人员的重要性。一个接口可以避免多少偶发问题而引入的黑暗和猜测,让你体会到debug的乐趣。它的优点在于它可以由进程来调用这个接口来转储自己的当前内存映像,相当于现在比较流行的“自拍”。
但是这些都不是重点。现在的问题是linux是是否有这个功能?如何实现?
二、Linux内核何时产生coredump文件
这个文件的产生是有内核来代劳的。内核的思维逻辑也很简单:内核检测到了用户态的进程出现了不可挽回的操作,并且用户态也没有关心这个错误,或者用户的程序无能为力关注这个错误,此时内核就仗义出手,代劳为这个消亡的进程举行简单的临终仪式,例如记录了它临终时的身体状况,分析退出原因并作为记录,这个就是coredump文件,这个过程就叫做coredump过程,有些有爱的程序员给出了一个很中国风的翻译“吐核”。
这里的假设很合理,也没有什么问题,但是和windows的接口相比,他有一个问题,就是用户不能控制coredump的发生时机。比方说,一个进程在运行的非常健康的时候,它就想执行一次自己的coredump过程(您要问什么情况下这么做,我也没想好?比方说定期生成一个检测文件作为纪念?)。那么在linux下是否能够实现呢?
在内核中,生成转储文件的代码在linux-2.6.37.1\kernel\signal.c文件中get_signal_to_deliver函数内,
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(regs, info->si_signo);
/*
* If it was able to dump core, this kills all
* other threads in the group and synchronizes with
* their demise. If we lost the race with another
* thread getting here, it set group_exit_code
* first and our do_group_exit call below will use
* that value and ignore the one we pass it.
*/
do_coredump(info->si_signo, info->si_signo, regs);整个内核中只有这一个地方调用了转储接口,也就是这里是内核中生成coredump的唯一接口。
}
/*
* Death signals, no core dump.
*/
do_group_exit(info->si_signo);线程组的退出和coredump是连体的,瓜儿连着藤 藤儿牵着瓜,也可以认为是一个霸王条款,coredump和线程组退出是绑定在一起的,要想吐核,必须付出生命代价。
总之,如果使用内核的coredump功能,那么转储之后线程组将会随之消失,所以不是可持续发展方案。这里也说明了用户态自己产生自己coredump文件的唯一方法,那就是取消自己某个可以产生coredump文件的信号处理函数,然后自己给自己发送这个信号,相当于壮烈自杀。
三、gdb如何生成coredump文件
有些同学对gdb可能比较熟悉,gdb有一条可持续吐核指令,那就是gcore命令,它可以用来随时随地产生一个coredump文件,这核人家吐的还是很潇洒的,不至于弄得头破血流,悲壮如斯。gdb是一个运行在用户态的程序,它同样不可能超越内核来调用内核的coredump接口,那么gdb是如何实现的呢?
看了一下gdb的代码,发现gdb真实一个劳动模范,甚至有些“打肿脸充胖子”的嫌疑,就是郭冬临小品里那句“有事儿您说话”。你想吐核,我又没办法超越内核给你吐一个,那我就自己造一个山寨的。所以gdb就不辞劳苦,四处搜集被调试进程的状态,然后自己装的小大人儿一样,把所有拼接的信息组装成一个coredump文件,然后高高兴兴的返回,虽然累得满头大汗,反正我是觉得这个过程还是挺麻烦的,至少从实现上看如此。
这里gdb的代码就不分析了,除了繁琐没有别的,因为它要支持的操作系统和处理器组合太多了。但是要注意的是多线程问题。一个线程最为核心和私有的数据一个是堆栈、一个是寄存器组。堆栈虽然从逻辑上是线程私有,但是它依然是在整个进程的地址空间中,所以把整个进程的地址空间转储到文件中就啥都有了。而对于每个进程的寄存器组,这个就需要调试器动用自己调试器的特殊身份,通过ptrace的 PTRACE_GETREGSET请求来获得被调试线程的所有寄存器。而其它的线程枚举对调试器来说更是小菜一碟。
还有一个就是coredump文件如何和内核生成的结构一致。这个是通过系统自带的一个头文件中定义的结构来实现的。我预处理看了一下,大概是在系统文件夹中(注意,不在gdb中,也不在C库中)/usr/include/sys/procfs.h文件中。有些同学会问,你怎么从这么个猥琐的地方找到这个冷僻的定义的,恩,因为我看gdb预处理输出了。由于这篇文章比较瘦弱,所以把这个文件整个提出来增增肥:
[tsecer@Harry searchorder]$ cat /usr/include/sys/procfs.h
/* Copyright (C) 1996, 1997, 1999, 2000 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
#ifndef _SYS_PROCFS_H
#define _SYS_PROCFS_H 1
/* This is somewhat modelled after the file of the same name on SVR4
systems. It provides a definition of the core file format for ELF
used on Linux. It doesn't have anything to do with the /proc file
system, even though Linux has one.
Anyway, the whole purpose of this file is for GDB and GDB only.
Don't read too much into it. Don't use it for anything other than
GDB unless you know what you are doing. */
#include <features.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/user.h>
__BEGIN_DECLS
/* Type for a general-purpose register. */
typedef unsigned long elf_greg_t;
/* And the whole bunch of them. We could have used `struct
user_regs_struct' directly in the typedef, but tradition says that
the register set is an array, which does have some peculiar
semantics, so leave it that way. */
#define ELF_NGREG (sizeof (struct user_regs_struct) / sizeof(elf_greg_t))
typedef elf_greg_t elf_gregset_t[ELF_NGREG];
/* Register set for the floating-point registers. */
typedef struct user_fpregs_struct elf_fpregset_t;
/* Register set for the extended floating-point registers. Includes
the Pentium III SSE registers in addition to the classic
floating-point stuff. */
typedef struct user_fpxregs_struct elf_fpxregset_t;
/* Signal info. */
struct elf_siginfo
{
int si_signo; /* Signal number. */
int si_code; /* Extra code. */
int si_errno; /* Errno. */
};
/* Definitions to generate Intel SVR4-like core files. These mostly
have the same names as the SVR4 types with "elf_" tacked on the
front to prevent clashes with Linux definitions, and the typedef
forms have been avoided. This is mostly like the SVR4 structure,
but more Linuxy, with things that Linux does not support and which
GDB doesn't really use excluded. */
struct elf_prstatus 每个线程有一个这样的结构在coredump文件中。
{
struct elf_siginfo pr_info; /* Info associated with signal. */
short int pr_cursig; /* Current signal. */
unsigned long int pr_sigpend; /* Set of pending signals. */
unsigned long int pr_sighold; /* Set of held signals. */
__pid_t pr_pid;
__pid_t pr_ppid;
__pid_t pr_pgrp;
__pid_t pr_sid;
struct timeval pr_utime; /* User time. */
struct timeval pr_stime; /* System time. */
struct timeval pr_cutime; /* Cumulative user time. */
struct timeval pr_cstime; /* Cumulative system time. */
elf_gregset_t pr_reg; /* GP registers. */
int pr_fpvalid; /* True if math copro being used. */
};
#define ELF_PRARGSZ (80) /* Number of chars for args. */
struct elf_prpsinfo 整个线程组有一个这样的结构在coredump文件中。
{
char pr_state; /* Numeric process state. */
char pr_sname; /* Char for pr_state. */
char pr_zomb; /* Zombie. */
char pr_nice; /* Nice val. */
unsigned long int pr_flag; /* Flags. */
unsigned short int pr_uid;
unsigned short int pr_gid;
int pr_pid, pr_ppid, pr_pgrp, pr_sid;
/* Lots missing */
char pr_fname[16]; /* Filename of executable. */
char pr_psargs[ELF_PRARGSZ]; /* Initial part of arg list. */
};
/* The rest of this file provides the types for emulation of the
Solaris <proc_service.h> interfaces that should be implemented by
users of libthread_db. */
/* Addresses. */
typedef void *psaddr_t;
/* Register sets. Linux has different names. */
typedef elf_gregset_t prgregset_t;
typedef elf_fpregset_t prfpregset_t;
/* We don't have any differences between processes and threads,
therefore have only one PID type. */
typedef __pid_t lwpid_t;
/* Process status and info. In the end we do provide typedefs for them. */
typedef struct elf_prstatus prstatus_t;
typedef struct elf_prpsinfo prpsinfo_t;
__END_DECLS
#endif /* sys/procfs.h */
[tsecer@Harry searchorder]$
四、结论
至少在linux2.6.37及之前版本(之后没看,估计也没有),用户态没有简单接口来实现coredump之后让进程全身而退,除非自己手动实现一个coredump格式文件的创建。