DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)
安装greenplum集群出现以下错误:
20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Checking configuration parameters, please wait...
20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Reading Greenplum configuration file init_config
20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Locale has not been set in init_config, will set to default value
20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Locale set to en_US.utf8
20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-No DATABASE_NAME set, will exit following template1 updates
20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-MASTER_MAX_CONNECT not set, will set to default value 250
20160315:13:49:17:025696 gpinitsystem:h95:jason-[INFO]:-Checking configuration parameters, Completed
20160315:13:49:17:025696 gpinitsystem:h95:jason-[INFO]:-Commencing multi-home checks, please wait...
..
20160315:13:49:17:025696 gpinitsystem:h95:jason-[INFO]:-Configuring build for standard array
20160315:13:49:17:025696 gpinitsystem:h95:jason-[INFO]:-Commencing multi-home checks, Completed
20160315:13:49:17:025696 gpinitsystem:h95:jason-[INFO]:-Building primary segment instance array, please wait...
..................
20160315:13:49:24:025696 gpinitsystem:h95:jason-[INFO]:-Checking Master host
20160315:13:49:24:025696 gpinitsystem:h95:jason-[INFO]:-Checking new segment hosts, please wait...
..................
20160315:13:49:39:025696 gpinitsystem:h95:jason-[INFO]:-Checking new segment hosts, Completed
20160315:13:49:39:025696 gpinitsystem:h95:jason-[INFO]:-Building the Master instance database, please wait...
20160315:13:49:49:025696 gpinitsystem:h95:jason-[INFO]:-Starting the Master in admin mode
20160315:13:51:35:025696 gpinitsystem:h95:jason-[INFO]:-Commencing parallel build of primary segment instances
20160315:13:51:35:025696 gpinitsystem:h95:jason-[INFO]:-Spawning parallel processes batch [1], please wait...
..................
20160315:13:51:36:025696 gpinitsystem:h95:jason-[INFO]:-Waiting for parallel processes batch [1], please wait...
..................................................
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:------------------------------------------------
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:-Parallel process exit status
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:------------------------------------------------
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:-Total processes marked as completed = 18
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:-Total processes marked as killed = 0
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:-Total processes marked as failed = 0
20160315:13:52:26:025696 gpinitsystem:h95:jason-[INFO]:------------------------------------------------
20160315:13:52:27:025696 gpinitsystem:h95:jason-[INFO]:-Deleting distributed backout files
20160315:13:52:27:025696 gpinitsystem:h95:jason-[INFO]:-Removing back out file
20160315:13:52:27:025696 gpinitsystem:h95:jason-[INFO]:-No errors generated from parallel processes
20160315:13:52:27:025696 gpinitsystem:h95:jason-[INFO]:-Restarting the Greenplum instance in production mode
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Starting gpstop with args: -a -i -m -d /home/jason/gpdata/gpseg-1
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Gathering information and validating the environment...
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Obtaining Greenplum Master catalog information
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Obtaining Segment details from master...
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Greenplum Version: 'greenplum (Greenplum Database) 4.3.99.00 build dev'
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-There are 0 connections to the database
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Commencing Master instance shutdown with mode='immediate'
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Master host=h95
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Commencing Master instance shutdown with mode=immediate
20160315:13:52:27:011100 gpstop:h95:jason-[INFO]:-Master segment instance directory=/home/jason/gpdata/gpseg-1
20160315:13:52:28:011100 gpstop:h95:jason-[INFO]:-Attempting forceful termination of any leftover master process
20160315:13:52:28:011100 gpstop:h95:jason-[INFO]:-Terminating processes for segment /home/jason/gpdata/gpseg-1
20160315:13:52:28:011100 gpstop:h95:jason-[ERROR]:-Failed to kill processes for segment /home/jason/gpdata/gpseg-1: ([Errno 3] No such process)
20160315:13:52:28:011187 gpstart:h95:jason-[INFO]:-Starting gpstart with args: -a -d /home/jason/gpdata/gpseg-1
20160315:13:52:28:011187 gpstart:h95:jason-[INFO]:-Gathering information and validating the environment...
20160315:13:52:28:011187 gpstart:h95:jason-[INFO]:-Greenplum Binary Version: 'greenplum (Greenplum Database) 4.3.99.00 build dev'
20160315:13:52:28:011187 gpstart:h95:jason-[INFO]:-Greenplum Catalog Version: '201310150'
20160315:13:52:28:011187 gpstart:h95:jason-[INFO]:-Starting Master instance in admin mode
20160315:13:52:29:011187 gpstart:h95:jason-[INFO]:-Obtaining Greenplum Master catalog information
20160315:13:52:29:011187 gpstart:h95:jason-[INFO]:-Obtaining Segment details from master...
20160315:13:52:30:011187 gpstart:h95:jason-[INFO]:-Setting new master era
20160315:13:52:30:011187 gpstart:h95:jason-[INFO]:-Master Started...
20160315:13:52:30:011187 gpstart:h95:jason-[INFO]:-Shutting down master
20160315:13:52:31:011187 gpstart:h95:jason-[INFO]:-Commencing parallel segment instance startup, please wait...
.......
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-Process results...
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-----------------------------------------------------
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:- Successful segment starts = 18
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:- Failed segment starts = 0
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-----------------------------------------------------
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-Successfully started 18 of 18 segment instances
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-----------------------------------------------------
20160315:13:52:38:011187 gpstart:h95:jason-[INFO]:-Starting Master instance h95 directory /home/jason/gpdata/gpseg-1
20160315:13:52:39:011187 gpstart:h95:jason-[INFO]:-Command sys_ctl reports Master h95 instance active
20160315:13:54:33:011187 gpstart:h95:jason-[WARNING]:-FATAL: DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)
20160315:13:54:33:011187 gpstart:h95:jason-[INFO]:-No standby master configured. skipping...
20160315:13:54:33:011187 gpstart:h95:jason-[INFO]:-Check status of database with gpstate utility
20160315:13:54:37:025696 gpinitsystem:h95:jason-[INFO]:-Completed restart of Greenplum instance in production mode
20160315:13:54:37:025696 gpinitsystem:h95:jason-[INFO]:-Loading gp_toolkit...
psql: FATAL: DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)
20160315:13:56:26:gpinitsystem:h95:jason-[FATAL]:-Failed to retrieve rolname. Script Exiting!
我的集群配置:两台机器,32g内存16g交换分区。每台机器9个节点。集群按照完成之后,显示segment启动的18个,但是通过psql连接不上,报错!
主要错误信息:
DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)
去官网看了很多人遇到此类的问题,错误原因有很多,今天特地总结以下:
Q&A1:系统环境变量没有设置正确,这个需要根据自己安装版本的greenplum去设置一下环境变量,可以去官网相对应的版本install guide 那里设置一下!
Q&A2:shared_buffers设置太大,对于如何根据自己内存和segment节点个数分配shared_buffers,可以去官网找一下,通常出去2g的other,以及statement_mem * segment 个数,剩下的除以segment的个数即可。这种情况通常出现中安装过程中就设置了shared_buffers,一般默认的125MB
Q&A3:防火墙是否关闭,这个情况最容易忽略,也是最容易出现的,通常有些人重启机器之后就忘记了关闭,我就是这样的,嘿嘿。你可以设置防火墙重启后一样生效!
。。。还有其他的原因欢迎来补充!谢谢,分享是一种美,希望能帮到你!