greenplum集群状态恢复与同步
查看集群状态
gpstate -m
或者也可以直接在gp集群里用sql查询
select * from gp_segment_configuration where status <>'u';
查看集群节点和实例的状态,发现有一台机子挂掉掉了,需要恢复节点和实例的状态。保守起见,在晚上没人用的时候停库恢复。
集群启动
su - gpadmin
gpstart
生成恢复文件
gprecoverseg -o ./reseg
恢复
gprecoverseg -i ./reseg
全部Synchronized后,改回节点状态
gprecoverseg -r
在集群恢复同步的过程中,可以通过 gpstate -m查看恢复进度,等down掉的实例状态从synchronizing变成synchronized,则同步完成
master节点,关键恢复日志
20190522:19:24:08:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./reseg
20190522:19:24:08:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.29.0 build 1'
20190522:19:24:09:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.29.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug 22 2018 23:17:57'
20190522:19:24:12:006085 gprecoverseg:gp-master:gpadmin-[INFO]:-Configuration file output to ./reseg successfully.
20190522:19:24:57:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Starting gprecoverseg with args: -i ./reseg
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Greenplum instance recovery parameters
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Recovery 1 of 16
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:----------------------------------------------------------
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Synchronization mode = Incremental
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Failed instance host = gp-seg4
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Failed instance address = gp-seg4
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Failed instance directory = /data/mirror/gpseg16
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Failed instance port = 41000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Failed instance replication port = 43000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Recovery Source instance host = gp-seg3
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Recovery Source instance address = gp-seg3
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Recovery Source instance directory = /data/primary/gpseg16
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Recovery Source instance port = 40000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Recovery Source instance replication port = 42000
20190522:19:25:02:031070 gprecoverseg:gp-master:gpadmin-[INFO]:- Recovery Target = in-place
20190522:19:25:20:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Updating configuration with new mirrors
20190522:19:25:21:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Updating mirrors
20190522:19:25:27:031070 gprecoverseg:gp-master:gpadmin-[INFO]:-Starting mirrors
master节点,集群状态关键日志
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-# of up segments, from configuration table = 48
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:-# of down segments, from configuration table = 16
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:- Down Segment Datadir Port
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:- gp-seg4 /data/mirror/gpseg16 41000
20190522:16:29:20:062535 gpstate:gp-master:gpadmin-[INFO]:- gp-seg4 /data/mirror/gpseg17 41001
20190522:19:33:31:006939 gpstate:gp-master:gpadmin-[WARNING]:-gp-seg1 /data/mirror/gpseg24 41000 Failed <<<<<<<<
20190522:19:34:16:007035 gpstate:gp-master:gpadmin-[INFO]:- gp-seg1 /data/mirror/gpseg24 41000 Passive Resynchronizing
20190522:19:35:17:007202 gpstate:gp-master:gpadmin-[INFO]:- gp-seg1 /data/mirror/gpseg24 41000 Passive Synchronized
down掉的节点,关键恢复日志
20190522:19:25:27:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Starting gpsegstart.py with args: -C en_US.utf8:en_US.utf8:en_US.utf8 -M quiescent -V postgres (Greenplum Database) 4.3.29.0 build 1 -n 32 --era
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Validating directories...
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Validating directory: /data/mirror/gpseg19
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[WARNING]:-postmaster.pid file exists, checking if recovery startup required
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-No db instance process, entering recovery startup mode
20190522:19:25:28:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Clearing db instance pid file
20190522:19:25:30:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-Postmaster /data/mirror/gpseg19 is running (pid 25822)
20190522:19:25:31:025719 gpsegstart.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[INFO]:-
COMMAND RESULTS
STATUS--DIR:/data/mirror/gpseg19--STARTED:True--REASONCODE:0--REASON:Start Succeeded
20190522:16:27:56:024030 gpgetstatususingtransition.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[WARNING]:-Error getting data from segment /data/mirror/gpseg16; it is not running
20190522:16:27:56:024030 gpgetstatususingtransition.py_gp-seg4:gpadmin:gp-seg4:gpadmin-[WARNING]:-Error getting data from segment /data/mirror/gpseg16; it is not running
posted on 2019-05-26 15:38 OneLi算法分享社区 阅读(1652) 评论(0) 编辑 收藏 举报