Fork me on GitHub

greenplum集群状态恢复与同步

查看集群状态

gpstate -m

或者也可以直接在gp集群里用sql查询

select * from gp_segment_configuration where status <>'u';

查看集群节点和实例的状态,发现有一台机子挂掉掉了,需要恢复节点和实例的状态。保守起见,在晚上没人用的时候停库恢复。

集群启动

su - gpadmin
gpstart

生成恢复文件

gprecoverseg -o ./reseg

恢复

gprecoverseg -i ./reseg

全部Synchronized后,改回节点状态

gprecoverseg -r

在集群恢复同步的过程中,可以通过 gpstate -m查看恢复进度,等down掉的实例状态从synchronizing变成synchronized,则同步完成

master节点,关键恢复日志

20190522:19:24:08:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./reseg
20190522:19:24:08:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.29.0 build 1'
20190522:19:24:09:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.29.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug 22 2018 23:17:57'
20190522:19:24:12:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-Configuration file output to ./reseg successfully.


20190522:19:24:57:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Starting gprecoverseg with args: -i ./reseg
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Greenplum instance recovery parameters
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Recovery 1 of 16
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:----------------------------------------------------------
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Synchronization mode                        = Incremental
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Failed instance host                        = gp-seg4
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Failed instance address                     = gp-seg4
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Failed instance directory                   = /data/mirror/gpseg16
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Failed instance port                        = 41000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Failed instance replication port            = 43000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Recovery Source instance host               = gp-seg3
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Recovery Source instance address            = gp-seg3
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Recovery Source instance directory          = /data/primary/gpseg16
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Recovery Source instance port               = 40000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Recovery Source instance replication port   = 42000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-   Recovery Target                             = in-place

20190522:19:25:20:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Updating configuration with new mirrors
20190522:19:25:21:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Updating mirrors
20190522:19:25:27:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Starting mirrors

master节点,集群状态关键日志

20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-# of up segments, from configuration table     = 48
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-# of down segments, from configuration table   = 16
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-   Down Segment   Datadir                  Port
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-   gp-seg4        /data/mirror/gpseg16     41000
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-   gp-seg4        /data/mirror/gpseg17     41001

20190522:19:33:31:006939 gpstate:gp-master:gpadmin-[WARNING]:-gp-seg1   /data/mirror/gpseg24    41000   Failed                   <<<<<<<<
20190522:19:34:16:007035 gpstate:gp-master:gpadmin-[INFO]:-   gp-seg1   /data/mirror/gpseg24    41000   Passive   Resynchronizing
20190522:19:35:17:007202 gpstate:gp-master:gpadmin-[INFO]:-   gp-seg1   /data/mirror/gpseg24    41000   Passive   Synchronized

down掉的节点,关键恢复日志

20190522:19:25:27:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Starting gpsegstart.py with args: -C en_US.utf8:en_US.utf8:en_US.utf8 -M quiescent -V postgres (Greenplum Database) 4.3.29.0 build 1 -n 32 --era

20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Validating directories...
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Validating directory: /data/mirror/gpseg19
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[WARNING]:-postmaster.pid file exists, checking if recovery startup required
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-No db instance process, entering recovery startup mode
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Clearing db instance pid file

20190522:19:25:30:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Postmaster /data/mirror/gpseg19 is running (pid 25822)
20190522:19:25:31:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-
COMMAND RESULTS
STATUS--DIR:/data/mirror/gpseg19--STARTED:True--REASONCODE:0--REASON:Start Succeeded

20190522:16:27:56:024030 gpgetstatususingtransition.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[WARNING]:-Error getting data from segment /data/mirror/gpseg16; it is not running
20190522:16:27:56:024030 gpgetstatususingtransition.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[WARNING]:-Error getting data from segment /data/mirror/gpseg16; it is not running

posted on 2019-05-26 15:38  OneLi算法分享社区  阅读(1652)  评论(0编辑  收藏  举报

导航