kill bgwriter 的小实验

如果 我直接 kill 掉 bgwriter 的进程,会发生什么呢?

[root@localhost postgresql-9.2.0]# ps -ef|grep post
root      2928  2897  0 10:34 pts/1    00:00:00 su - postgres
postgres  2929  2928  0 10:34 pts/1    00:00:00 -bash
postgres  3101  2929  0 11:09 pts/1    00:00:00 ./postgres -D /usr/local/pgsql/data
postgres  3103  3101  0 11:09 ?        00:00:00 postgres: checkpointer process     
postgres  3104  3101  0 11:09 ?        00:00:00 postgres: writer process           
postgres  3105  3101  0 11:09 ?        00:00:00 postgres: wal writer process       
postgres  3106  3101  0 11:09 ?        00:00:00 postgres: autovacuum launcher process   
postgres  3107  3101  0 11:09 ?        00:00:00 postgres: stats collector process   
root      3109  2977  0 11:10 pts/2    00:00:00 grep post
[root@localhost postgresql-9.2.0]# kill 3104
[root@localhost postgresql-9.2.0]# ps -ef|grep post
root      2928  2897  0 10:34 pts/1    00:00:00 su - postgres
postgres  2929  2928  0 10:34 pts/1    00:00:00 -bash
postgres  3101  2929  0 11:09 pts/1    00:00:00 ./postgres -D /usr/local/pgsql/data
postgres  3103  3101  0 11:09 ?        00:00:00 postgres: checkpointer process     
postgres  3105  3101  0 11:09 ?        00:00:00 postgres: wal writer process       
postgres  3106  3101  0 11:09 ?        00:00:00 postgres: autovacuum launcher process   
postgres  3107  3101  0 11:09 ?        00:00:00 postgres: stats collector process   
postgres  3110  3101  0 11:10 ?        00:00:00 postgres: writer process           
root      3112  2977  0 11:10 pts/2    00:00:00 grep post
[root@localhost postgresql-9.2.0]# kill 3110
[root@localhost postgresql-9.2.0]# ps -ef|grep post
root      2928  2897  0 10:34 pts/1    00:00:00 su - postgres
postgres  2929  2928  0 10:34 pts/1    00:00:00 -bash
postgres  3101  2929  0 11:09 pts/1    00:00:00 ./postgres -D /usr/local/pgsql/data
postgres  3103  3101  0 11:09 ?        00:00:00 postgres: checkpointer process     
postgres  3105  3101  0 11:09 ?        00:00:00 postgres: wal writer process       
postgres  3106  3101  0 11:09 ?        00:00:00 postgres: autovacuum launcher process   
postgres  3107  3101  0 11:09 ?        00:00:00 postgres: stats collector process   
postgres  3114  3101  0 11:10 ?        00:00:00 postgres: writer process           
root      3116  2977  0 11:10 pts/2    00:00:00 grep post
[root@localhost postgresql-9.2.0]# 

我删除了几次 bgwriter 的进程,都再次生成了。

那么其原因是什么呢?

这和 postmaster.c 的监控有关。来看代码吧:为简化起见,吧postmaster 与 postgres 当成一个东西。

postmaster 生成了各个子进程以后,会在一旁进行监控:

/*                                    
 * Reaper -- signal handler to cleanup after a child process dies.                                    
 */                                    
static void                                    
reaper(SIGNAL_ARGS)                                    
{                                    
    int            save_errno = errno;                    
    int            pid;        /* process id of dead child process */            
    int            exitstatus;        /* its exit status */            
                                    
    /* These macros hide platform variations in getting child status */                                
#ifdef HAVE_WAITPID                                    
    int            status;            /* child exit status */        
                                    
#define LOOPTEST()            ((pid = waitpid(-1, &status, WNOHANG)) > 0)                        
#define LOOPHEADER()            (exitstatus = status)                        
#else                            /* !HAVE_WAITPID */        
#ifndef WIN32                                    
    union wait    status;            /* child exit status */                
                                    
#define LOOPTEST()        ((pid = wait3(&status, WNOHANG, NULL)) > 0)                            
#define LOOPHEADER()    (exitstatus = status.w_status)                                
#else                            /* WIN32 */        
#define LOOPTEST()        ((pid = win32_waitpid(&exitstatus)) > 0)                            
#define LOOPHEADER()                                    
#endif   /* WIN32 */                                    
#endif   /* HAVE_WAITPID */                                    
                                    
    PG_SETMASK(&BlockSig);                                
                                    
    ereport(DEBUG4,                                
            (errmsg_internal("reaping dead processes")));                        
                                    
    while (LOOPTEST())                                
    {                                
        LOOPHEADER();                            
                                    
        ……                            
                                    
        /*                            
         * Was it the bgwriter?  Normal exit can be ignored; we'll start a new                            
         * one at the next iteration of the postmaster's main loop, if                            
         * necessary.  Any other exit condition is treated as a crash.                            
         */                            
        if (pid == BgWriterPID)                            
        {                            
            BgWriterPID = 0;                        
            if (!EXIT_STATUS_0(exitstatus))                        
                HandleChildCrash(pid, exitstatus,                    
                         _("background writer process"));            
            continue;                        
        }                            
                                    
        ……                            
    }                                
                                    
……                                    
}                                    

[作者:技术者高健@博客园  mail: luckyjackgao@gmail.com ]

由于我所使用的是 linux 平台,

[root@localhost postgresql-9.2.0]# find ./ -name "*.h"|xargs grep "HAVE_WAITPID"

./src/include/pg_config.h:#define HAVE_WAITPID 1
[root@localhost postgresql-9.2.0]#
所以,循环程序可以认为是:

while (((pid = waitpid(-1, &status, WNOHANG)) > 0))                                
{                                
    exitstatus = status;                            
                                
    ……                            
                                
    /*                            
     * Was it the bgwriter?  Normal exit can be ignored; we'll start a new                            
     * one at the next iteration of the postmaster's main loop, if                            
     * necessary.  Any other exit condition is treated as a crash.                            
     */                            
    if (pid == BgWriterPID)                            
    {                            
        BgWriterPID = 0;                        
        if (!(exitstatus==0))                        
            HandleChildCrash(pid, exitstatus,                    
                     _("background writer process"));            
        continue;                        
    }                            
                                
    ……                            
}                                

waitpid 用于监控子进程的结束。

其参数:

pid=-1 就是 等待任何子进程,相当于 wait()。

WNOHANG 就是  若pid指定的子进程没有结束,则waitpid()函数返回0,不予以等待。若结束,则返回该子进程的ID

而 HandleChildCrash 会完成重新建立子进程的工作。

[作者:技术者高健@博客园  mail: luckyjackgao@gmail.com ]

结束

posted @ 2012-10-31 11:16  健哥的数据花园  阅读(277)  评论(0编辑  收藏  举报