gp数据库停止

 greenplum是2(master)+7(segment)的集群规模

      系统刚准备上线,是用来做统计数据库的,正在帮忙一个hadoop集群核对其数据的准确性,在这个greenplum库中入了清单数据

     后检查分析是部分建表语句存在问题,没有指定字段做分布键,也没有指定其是随机分布,导致默认为第一个字段做为分布键导致数据倾斜。

  发现数据库非常慢,几乎是不可用,检查greenplum的状态情况

1、检查greenplum数据库的状态

gpadmin@mdw:~> gpstate   20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args:   20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.3.1 build 1'   20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.3.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct 10 2014 14:31:50'   20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...   20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...   ......   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-Greenplum instance status summary   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Master instance                                           = Active   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Master standby                                            = smdw   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Standby master state                                      = Standby host passive   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total segment instance count from metadata                = 56   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Primary Segment Status   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total primary segments                                    = 28   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total primary segment valid (at master)                   = 24   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[WARNING]:-Total primary segment failures (at master)                = 4                      <<<<<<<<   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 28   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 28   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 28   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[WARNING]:-Total number postmaster processes missing                 = 4                      <<<<<<<<   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number postmaster processes found                   = 24   20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Mirror Segment Status   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total mirror segments                                     = 28   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 21   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[WARNING]:-Total mirror segment failures (at master)                 = 7                      <<<<<<<<   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 28   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 28   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 28   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[WARNING]:-Total number postmaster processes missing                 = 4                      <<<<<<<<   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number postmaster processes found                   = 24   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[WARNING]:-Total number mirror segments acting as primary segments   = 4                      <<<<<<<<   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 24   20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------

2、检查数据库服务器的情况

gpadmin@mdw:/> gpssh -h sdw1 -h sdw2 -h sdw3 -h sdw4 -h sdw5 -h sdw6 -h sdw7 "df -hT"   [sdw4] df: "/root/.gvfs": 权限不够 [sdw4] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw4] /dev/sda2      ext3       99G  5.7G   88G    7% /   [sdw4] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw4] tmpfs          tmpfs      95G  100K   95G    1% /dev/shm   [sdw4] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw4] /dev/sda5      ext3      197G  188M  187G    1% /home   [sdw4] /dev/sdb       xfs       4.6T  3.8T  865G   82% /data   [sdw5] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw5] /dev/sda2      ext3       60G  5.5G   51G   10% /   [sdw5] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw5] tmpfs          tmpfs      32G   88K   32G    1% /dev/shm   [sdw5] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw5] /dev/sda5      ext3      785G  197M  745G    1% /home   [sdw5] /dev/sdb       xfs       4.6T  2.4T  2.2T   53% /data   [sdw6] df: "/root/.gvfs": 权限不够 [sdw6] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw6] /dev/sda2      ext3       99G  910M   93G    1% /   [sdw6] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw6] tmpfs          tmpfs      47G  100K   47G    1% /dev/shm   [sdw6] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw6] /dev/sda5      ext3      197G  188M  187G    1% /home   [sdw6] /dev/sda3      ext3       63G  5.0G   55G    9% /usr   [sdw6] /dev/sdb       xfs       4.6T  4.5T   93G   99% /data   [sdw6] /dev/sr0       iso9660   3.1G  3.1G     0  100% /media/SLES-11-SP2-DVD-x86_6407551   [sdw7] df: "/root/.gvfs": 权限不够 [sdw7] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw7] /dev/sda2      ext3       60G  440M   56G    1% /   [sdw7] devtmpfs       devtmpfs   32G  244K   32G    1% /dev   [sdw7] tmpfs          tmpfs      95G  112K   95G    1% /dev/shm   [sdw7] /dev/sda1      ext3      9.9G  180M  9.2G    2% /boot   [sdw7] /dev/sda5      ext3      197G  188M  187G    1% /home   [sdw7] /dev/sda6      ext3      9.9G  264M  9.1G    3% /opt   [sdw7] /dev/sda8      ext3      9.9G  151M  9.2G    2% /srv   [sdw7] /dev/sda7      ext3      9.9G  162M  9.2G    2% /tmp   [sdw7] /dev/sda9      ext3       40G  4.8G   33G   13% /usr   [sdw7] /dev/sda10     ext3      9.9G  358M  9.0G    4% /var   [sdw7] /dev/sdb       xfs       4.6T  3.7T  943G   80% /data   [sdw1] df: "/root/.gvfs": 权限不够 [sdw1] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw1] /dev/sda1      ext3       99G  5.7G   88G    7% /   [sdw1] devtmpfs       devtmpfs   32G  444K   32G    1% /dev   [sdw1] tmpfs          tmpfs      95G  100K   95G    1% /dev/shm   [sdw1] /dev/sda2      ext3      7.9G  216M  7.3G    3% /boot   [sdw1] /dev/sda3      ext3      197G  188M  187G    1% /home   [sdw1] /dev/sdb       xfs       4.6T  3.6T  1.1T   78% /data   [sdw2] df: "/root/.gvfs": 权限不够 [sdw2] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw2] /dev/sda1      ext3       99G  5.7G   88G    7% /   [sdw2] devtmpfs       devtmpfs   32G  444K   32G    1% /dev   [sdw2] tmpfs          tmpfs      95G  100K   95G    1% /dev/shm   [sdw2] /dev/sda2      ext3      7.9G  216M  7.3G    3% /boot   [sdw2] /dev/sda3      ext3      197G  188M  187G    1% /home   [sdw2] /dev/sdb       xfs       4.6T  4.6T  2.0G  100% /data   [sdw3] df: "/root/.gvfs": 权限不够 [sdw3] 文件系统       类型      容量  已用  可用 已用% 挂载瀿 [sdw3] /dev/sda2      ext3       60G  5.5G   51G   10% /   [sdw3] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw3] tmpfs          tmpfs      32G  100K   32G    1% /dev/shm   [sdw3] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw3] /dev/sda5      ext3      785G  197M  745G    1% /home   [sdw3] /dev/sdb       xfs       4.6T  3.8T  856G   82% /data 发现segment中的sdw2和sdw6的数据空间/data目录的收益率已经达到100%。

3、数据库上删除清单表

用psql连接数据库,执行drop table 表名,执行后,等待大半天也没反应,使用gpssh -f检查服务器的io使用情况,segment服务器的IO没有读写操作,证明数据库已经没办法分发命令下去。

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util   sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util   sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util   sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util   sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00   sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

 

4、分别登陆sdw2和sdw6

#cd /data/primary/gpseg4/gp_log

#ls -ltrh

目录下有gpdb-日期.csv文件,直接执行

#rm -rf *.csv

每个gpseg下的gp_log都执行同样的删除操作,执行完后,空间释放得非常少。

=> df -hT   [sdw4] df: "/root/.gvfs": 权限不够 [sdw4] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw4] /dev/sda2      ext3       99G  5.7G   88G    7% /   [sdw4] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw4] tmpfs          tmpfs      95G  100K   95G    1% /dev/shm   [sdw4] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw4] /dev/sda5      ext3      197G  188M  187G    1% /home   [sdw4] /dev/sdb       xfs       4.6T  3.7T  868G   82% /data   [sdw5] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw5] /dev/sda2      ext3       60G  5.5G   51G   10% /   [sdw5] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw5] tmpfs          tmpfs      32G   88K   32G    1% /dev/shm   [sdw5] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw5] /dev/sda5      ext3      785G  197M  745G    1% /home   [sdw5] /dev/sdb       xfs       4.6T  2.4T  2.2T   53% /data   [sdw6] df: "/root/.gvfs": 权限不够 [sdw6] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw6] /dev/sda2      ext3       99G  911M   93G    1% /   [sdw6] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw6] tmpfs          tmpfs      47G  100K   47G    1% /dev/shm   [sdw6] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw6] /dev/sda5      ext3      197G  188M  187G    1% /home   [sdw6] /dev/sda3      ext3       63G  5.0G   55G    9% /usr   [sdw6] /dev/sdb       xfs       4.6T  4.5T   96G   98% /data   [sdw6] /dev/sr0       iso9660   3.1G  3.1G     0  100% /media/SLES-11-SP2-DVD-x86_6407551   [sdw7] df: "/root/.gvfs": 权限不够 [sdw7] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw7] /dev/sda2      ext3       60G  440M   56G    1% /   [sdw7] devtmpfs       devtmpfs   32G  244K   32G    1% /dev   [sdw7] tmpfs          tmpfs      95G  112K   95G    1% /dev/shm   [sdw7] /dev/sda1      ext3      9.9G  180M  9.2G    2% /boot   [sdw7] /dev/sda5      ext3      197G  188M  187G    1% /home   [sdw7] /dev/sda6      ext3      9.9G  264M  9.1G    3% /opt   [sdw7] /dev/sda8      ext3      9.9G  151M  9.2G    2% /srv   [sdw7] /dev/sda7      ext3      9.9G  162M  9.2G    2% /tmp   [sdw7] /dev/sda9      ext3       40G  4.8G   33G   13% /usr   [sdw7] /dev/sda10     ext3      9.9G  359M  9.0G    4% /var   [sdw7] /dev/sdb       xfs       4.6T  3.7T  945G   80% /data   [sdw1] df: "/root/.gvfs": 权限不够 [sdw1] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw1] /dev/sda1      ext3       99G  5.7G   88G    7% /   [sdw1] devtmpfs       devtmpfs   32G  444K   32G    1% /dev   [sdw1] tmpfs          tmpfs      95G  100K   95G    1% /dev/shm   [sdw1] /dev/sda2      ext3      7.9G  216M  7.3G    3% /boot   [sdw1] /dev/sda3      ext3      197G  188M  187G    1% /home   [sdw1] /dev/sdb       xfs       4.6T  3.6T  1.1T   78% /data   [sdw2] df: "/root/.gvfs": 权限不够 [sdw2] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw2] /dev/sda1      ext3       99G  5.7G   88G    7% /   [sdw2] devtmpfs       devtmpfs   32G  444K   32G    1% /dev   [sdw2] tmpfs          tmpfs      95G  100K   95G    1% /dev/shm   [sdw2] /dev/sda2      ext3      7.9G  216M  7.3G    3% /boot   [sdw2] /dev/sda3      ext3      197G  188M  187G    1% /home   [sdw2] /dev/sdb       xfs       4.6T  4.6T   21G  100% /data   [sdw3] df: "/root/.gvfs": 权限不够 [sdw3] 文件系统       类型      容量  已用  可用 已用% 挂载点 [sdw3] /dev/sda2      ext3       60G  5.5G   51G   10% /   [sdw3] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [sdw3] tmpfs          tmpfs      32G  100K   32G    1% /dev/shm   [sdw3] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [sdw3] /dev/sda5      ext3      785G  198M  745G    1% /home   [sdw3] /dev/sdb       xfs       4.6T  3.8T  859G   82% /data   [smdw] df: "/root/.gvfs": 权限不够 [smdw] 文件系统       类型      容量  已用  可用 已用% 挂载点 [smdw] /dev/sda2      ext3       60G  5.8G   51G   11% /   [smdw] devtmpfs       devtmpfs   32G  448K   32G    1% /dev   [smdw] tmpfs          tmpfs      32G  100K   32G    1% /dev/shm   [smdw] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [smdw] /dev/sda5      ext3       40G   23G   16G   60% /home   [smdw] /dev/sda6      xfs       757G  3.3G  754G    1% /data   [smdw] /dev/sr0       iso9660   3.1G  3.1G     0  100% /media/SLES-11-SP2-DVD-x86_6407551   [ mdw] df: "/root/.gvfs": 权限不够 [ mdw] 文件系统       类型      容量  已用  可用 已用% 挂载点 [ mdw] /dev/sda2      ext3       60G  6.4G   50G   12% /   [ mdw] devtmpfs       devtmpfs   32G  456K   32G    1% /dev   [ mdw] tmpfs          tmpfs      32G  100K   32G    1% /dev/shm   [ mdw] /dev/sda1      ext3      9.9G  220M  9.2G    3% /boot   [ mdw] /dev/sda5      ext3       40G  281M   38G    1% /home   [ mdw] /dev/sda6      xfs       757G  6.4G  751G    1% /data

5、停数据库

执行gpstop

报:-gpstop failed(Reason='FATAL: the database system is shutting down') exiting ...

gpstop停不下来。

 

6、检查进程

 gpssh -h sdw1 -h sdw2 -h sdw3 -h sdw4 -h sdw5 -h sdw6 -h sdw7 -h mdw -h smdw "ps -ef |grep postgres"

数据库segment上进程没有停。下面是一台segment上的进程:

[sdw4] gpadmin  15168 31843  0 Feb26 ?        00:00:14 postgres: port 40000, cems cems 198.168.11.11(52166) con6959 seg12 idle in transaction                                                [sdw4] gpadmin  15170 31838  0 Feb26 ?        00:00:15 postgres: port 40001, cems cems 198.168.11.11(38065) con6959 seg13 idle in transaction                                                [sdw4] gpadmin  15172 31841  0 Feb26 ?        00:00:16 postgres: port 40002, cems cems 198.168.11.11(3175) con6959 seg14 idle in transaction                                                 [sdw4] gpadmin  15174 31837  0 Feb26 ?        00:00:16 postgres: port 40003, cems cems 198.168.11.11(31152) con6959 seg15 idle in transaction                                                [sdw4] gpadmin  15176 31843  0 Feb26 ?        00:00:04 postgres: port 40000, cems cems 198.168.11.11(52194) con6959 seg12 idle                                                               [sdw4] gpadmin  15178 31838  0 Feb26 ?        00:00:04 postgres: port 40001, cems cems 198.168.11.11(38093) con6959 seg13 idle                                                               [sdw4] gpadmin  15180 31841  0 Feb26 ?        00:00:04 postgres: port 40002, cems cems 198.168.11.11(3203) con6959 seg14 idle                                                                [sdw4] gpadmin  15182 31837  0 Feb26 ?        00:00:04 postgres: port 40003, cems cems 198.168.11.11(31180) con6959 seg15 idle                                                               [sdw4] gpadmin  15204 31843  0 Feb26 ?        00:00:16 postgres: port 40000, cems cems 198.168.11.11(52298) con6949 seg12 idle in transaction                                                [sdw4] gpadmin  15206 31838  0 Feb26 ?        00:00:15 postgres: port 40001, cems cems 198.168.11.11(38197) con6949 seg13 idle in transaction                                                [sdw4] gpadmin  15208 31841  0 Feb26 ?        00:00:16 postgres: port 40002, cems cems 198.168.11.11(3307) con6949 seg14 idle in transaction                                                 [sdw4] gpadmin  15210 31837  0 Feb26 ?        00:00:15 postgres: port 40003, cems cems 198.168.11.11(31284) con6949 seg15 idle in transaction                                                [sdw4] gpadmin  31836     1  0 Feb04 ?        00:00:04 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg27 -p 50003 -b 57 -z 28 --silent-mode=true -i -M quiescent -C 27   [sdw4] gpadmin  31837     1  0 Feb04 ?        00:10:13 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg15 -p 40003 -b 17 -z 28 --silent-mode=true -i -M quiescent -C 15   [sdw4] gpadmin  31838     1  0 Feb04 ?        00:10:11 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg13 -p 40001 -b 15 -z 28 --silent-mode=true -i -M quiescent -C 13   [sdw4] gpadmin  31839     1  0 Feb04 ?        00:00:05 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg5 -p 50001 -b 35 -z 28 --silent-mode=true -i -M quiescent -C 5   [sdw4] gpadmin  31840     1  0 Feb04 ?        00:00:03 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg2 -p 50002 -b 32 -z 28 --silent-mode=true -i -M quiescent -C 2   [sdw4] gpadmin  31841     1  0 Feb04 ?        00:10:20 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg14 -p 40002 -b 16 -z 28 --silent-mode=true -i -M quiescent -C 14   [sdw4] gpadmin  31842     1  0 Feb04 ?        00:00:04 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg8 -p 50000 -b 38 -z 28 --silent-mode=true -i -M quiescent -C 8   [sdw4] gpadmin  31843     1  0 Feb04 ?        00:10:26 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg12 -p 40000 -b 14 -z 28 --silent-mode=true -i -M quiescent -C 12   [sdw4] gpadmin  31844 31841  0 Feb04 ?        00:01:15 postgres: port 40002, logger process                                                                                                  [sdw4] gpadmin  31845 31838  0 Feb04 ?        00:01:21 postgres: port 40001, logger process                                                                                                  [sdw4] gpadmin  31846 31839  0 Feb04 ?        00:01:07 postgres: port 50001, logger process                                                                                               [sdw4] gpadmin  31847 31836  0 Feb04 ?        00:01:13 postgres: port 50003, logger process                                                                                                 [sdw4] gpadmin  31848 31843  0 Feb04 ?        00:01:19 postgres: port 40000, logger process                                                                                                  [sdw4] gpadmin  31849 31842  0 Feb04 ?        00:01:07 postgres: port 50000, logger process                                                                                               [sdw4] gpadmin  31850 31837  0 Feb04 ?        00:01:25 postgres: port 40003, logger process                                                                                                  [sdw4] gpadmin  31851 31840  0 Feb04 ?        00:01:04 postgres: port 50002, logger process                                                                                               [sdw4] gpadmin  31868 31836  0 Feb04 ?        00:08:55 postgres: port 50003, mirror process                                                                                                 [sdw4] gpadmin  31871 31837  0 Feb04 ?        00:09:13 postgres: port 40003, primary process                                                                                                 [sdw4] gpadmin  31873 31841  0 Feb04 ?        00:08:56 postgres: port 40002, primary process                                                                                                 [sdw4] gpadmin  31874 31868  0 Feb04 ?        01:10:10 postgres: port 50003, mirror receiver process                                                                                        [sdw4] gpadmin  31876 31868  0 Feb04 ?        00:34:17 postgres: port 50003, mirror consumer process                                                                                        [sdw4] gpadmin  31877 31868  0 Feb04 ?        00:12:24 postgres: port 50003, mirror consumer writer process                                                                                 [sdw4] gpadmin  31878 31868  0 Feb04 ?        00:36:29 postgres: port 50003, mirror consumer append only process                                                                            [sdw4] gpadmin  31879 31838  0 Feb04 ?        00:08:55 postgres: port 40001, primary process                                                                                                 [sdw4] gpadmin  31881 31868  0 Feb04 ?        00:07:44 postgres: port 50003, mirror sender ack process                                                                                      [sdw4] gpadmin  31882 31868  0 Feb04 ?        00:00:03 postgres: port 50003, mirror verification process                                                                                    [sdw4] gpadmin  31883 31871  0 Feb04 ?        00:07:12 postgres: port 40003, primary receiver ack process                                                                                    [sdw4] gpadmin  31885 31871  0 Feb04 ?        01:15:22 postgres: port 40003, primary sender process                                                                                          [sdw4] gpadmin  31886 31871  0 Feb04 ?        00:07:00 postgres: port 40003, primary consumer ack process                                                                                    [sdw4] gpadmin  31887 31871  0 Feb04 ?        00:15:41 postgres: port 40003, primary recovery process                                                                                        [sdw4] gpadmin  31888 31842  0 Feb04 ?        00:09:02 postgres: port 50000, mirror process                                                                                               [sdw4] gpadmin  31889 31871  0 Feb04 ?        00:01:08 postgres: port 40003, primary verification process                                                                                    [sdw4] gpadmin  31892 31873  0 Feb04 ?        00:07:19 postgres: port 40002, primary receiver ack process                                                                                    [sdw4] gpadmin  31893 31873  0 Feb04 ?        01:10:48 postgres: port 40002, primary sender process                                                                                          [sdw4] gpadmin  31894 31873  0 Feb04 ?        00:06:47 postgres: port 40002, primary consumer ack process                                                                                    [sdw4] gpadmin  31895 31873  0 Feb04 ?        00:15:34 postgres: port 40002, primary recovery process                                                                                        [sdw4] gpadmin  31896 31873  0 Feb04 ?        00:01:13 postgres: port 40002, primary verification process                                                                                    [sdw4] gpadmin  31898 31839  0 Feb04 ?        00:09:08 postgres: port 50001, mirror process                                                                                               [sdw4] gpadmin  31900 31840  0 Feb04 ?        00:09:03 postgres: port 50002, mirror process                                                                                               [sdw4] gpadmin  31901 31879  0 Feb04 ?        00:07:05 postgres: port 40001, primary receiver ack process                                                                                    [sdw4] gpadmin  31902 31879  0 Feb04 ?        01:07:19 postgres: port 40001, primary sender process                                                                                          [sdw4] gpadmin  31903 31879  0 Feb04 ?        00:09:53 postgres: port 40001, primary consumer ack process                                                                                    [sdw4] gpadmin  31904 31879  0 Feb04 ?        00:15:36 postgres: port 40001, primary recovery process                                                                                        [sdw4] gpadmin  31905 31879  0 Feb04 ?        00:01:13 postgres: port 40001, primary verification process                                                                                    [sdw4] gpadmin  31911 31888  0 Feb04 ?        01:11:45 postgres: port 50000, mirror receiver process                                                                                      [sdw4] gpadmin  31912 31888  0 Feb04 ?        00:34:47 postgres: port 50000, mirror consumer process                                                                                      [sdw4] gpadmin  31913 31898  0 Feb04 ?        01:19:23 postgres: port 50001, mirror receiver process                                                                                      [sdw4] gpadmin  31914 31888  0 Feb04 ?        00:13:06 postgres: port 50000, mirror consumer writer process                                                                               [sdw4] gpadmin  31915 31898  0 Feb04 ?        00:38:56 postgres: port 50001, mirror consumer process                                                                                      [sdw4] gpadmin  31916 31888  0 Feb04 ?        00:36:17 postgres: port 50000, mirror consumer append only process                                                                          [sdw4] gpadmin  31917 31898  0 Feb04 ?        00:19:13 postgres: port 50001, mirror consumer writer process                                                                               [sdw4] gpadmin  31919 31898  0 Feb04 ?        00:36:00 postgres: port 50001, mirror consumer append only process                                                                          [sdw4] gpadmin  31920 31888  0 Feb04 ?        00:07:49 postgres: port 50000, mirror sender ack process                                                                                    [sdw4] gpadmin  31922 31888  0 Feb04 ?        00:00:03 postgres: port 50000, mirror verification process                                                                                  [sdw4] gpadmin  31923 31898  0 Feb04 ?        00:08:11 postgres: port 50001, mirror sender ack process                                                                                    [sdw4] gpadmin  31924 31898  0 Feb04 ?        00:00:03 postgres: port 50001, mirror verification process                                                                                  [sdw4] gpadmin  31925 31900  0 Feb04 ?        01:05:21 postgres: port 50002, mirror receiver process                                                                                      [sdw4] gpadmin  31926 31900  0 Feb04 ?        00:34:34 postgres: port 50002, mirror consumer process                                                                                      [sdw4] gpadmin  31927 31900  0 Feb04 ?        00:12:25 postgres: port 50002, mirror consumer writer process                                                                               [sdw4] gpadmin  31928 31900  0 Feb04 ?        00:36:15 postgres: port 50002, mirror consumer append only process                                                                          [sdw4] gpadmin  31930 31900  0 Feb04 ?        00:07:48 postgres: port 50002, mirror sender ack process                                                                                    [sdw4] gpadmin  31931 31900  0 Feb04 ?        00:00:03 postgres: port 50002, mirror verification process                                                                                  [sdw4] gpadmin  31937 31843  0 Feb04 ?        00:00:40 postgres: port 40000, stats collector process                                                                                         [sdw4] gpadmin  31938 31843  0 Feb04 ?        00:07:18 postgres: port 40000, writer process                                                                                                  [sdw4] gpadmin  31939 31843  0 Feb04 ?        00:02:02 postgres: port 40000, checkpoint process                                                                                              [sdw4] gpadmin  31940 31843  0 Feb04 ?        00:01:52 postgres: port 40000, sweeper process                                                                                                 [sdw4] gpadmin  31944 31837  0 Feb04 ?        00:00:40 postgres: port 40003, stats collector process                                                                                         [sdw4] gpadmin  31945 31837  0 Feb04 ?        00:07:55 postgres: port 40003, writer process                                                                                                  [sdw4] gpadmin  31946 31837  0 Feb04 ?        00:03:28 postgres: port 40003, checkpoint process                                                                                              [sdw4] gpadmin  31947 31837  0 Feb04 ?        00:01:50 postgres: port 40003, sweeper process                                                                                                 [sdw4] gpadmin  31948 31841  0 Feb04 ?        00:00:36 postgres: port 40002, stats collector process                                                                                         [sdw4] gpadmin  31949 31841  0 Feb04 ?        00:08:00 postgres: port 40002, writer process                                                                                                  [sdw4] gpadmin  31950 31841  0 Feb04 ?        00:02:49 postgres: port 40002, checkpoint process                                                                                              [sdw4] gpadmin  31951 31841  0 Feb04 ?        00:01:52 postgres: port 40002, sweeper process                                                                                                 [sdw4] gpadmin  31952 31838  0 Feb04 ?        00:00:36 postgres: port 40001, stats collector process                                                                                         [sdw4] gpadmin  31953 31838  0 Feb04 ?        00:07:39 postgres: port 40001, writer process                                                                                                  [sdw4] gpadmin  31954 31838  0 Feb04 ?        00:02:36 postgres: port 40001, checkpoint process                                                                                              [sdw4] gpadmin  31955 31838  0 Feb04 ?        00:01:47 postgres: port 40001, sweeper process                                                                                                 [sdw4] gpadmin  37706 37670  0 17:52 pts/0    00:00:00 grep postgres   [sdw4] gpadmin  42312 31843  0 Feb12 ?        00:02:50 postgres: port 40000, primary process                                                                                                 [sdw4] gpadmin  42313 42312  0 Feb12 ?        00:09:58 postgres: port 40000, primary recovery process

 

7、在master上执行

#gpstop -af

报:-gpstop failed(Reason='FATAL: the database system is shutting down') exiting ...

同样停不了库

gpadmin@mdw:~> gpstop -M fast   20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Starting gpstop with args: -M fast   20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...   20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information   20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...   20150227:18:42:01:021768 gpstop:mdw:gpadmin-[CRITICAL]:-gpstop failed. (Reason='FATAL:  the database system is shutting down   ') exiting...

同样不行。

 

8、直接kill

gpssh  -h sdw1 -h sdw2 -h sdw3 -h sdw4 -h sdw5 -h sdw6 -h sdw7 -h mdw -h smdw "for i in `ps -ef |grep postgres |awk {'print $2'}`;do kill $i; done"

使用上面的检查,segment上的进程都已停止,但是master上的部分进程没有停。

 

9、停止master进程

mdw -h smdw ps -ef |grep postgres   [ mdw] gpadmin  15765 15724  0 17:52 pts/11   00:00:00 grep postgres   [ mdw] gpadmin  38876 41830  0 Feb26 ?        00:05:11 postgres: port  5432, cems cems 198.168.11.12(51943) con6959 198.168.11.12(51943) cmd24705 BIND                                          [ mdw] gpadmin  41830     1  0 Feb04 ?        00:00:07 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/master/gpseg-1 -p 5432 -b 1 -z 28 --silent-mode=true -i -M master -C -1 -x 58 -E   [ mdw] gpadmin  41838 41830  0 Feb04 ?        00:10:18 postgres: port  5432, master logger process                                                                                              [ mdw] gpadmin  41844 41830  0 Feb04 ?        00:00:18 postgres: port  5432, stats collector process                                                                                            [ mdw] gpadmin  41845 41830  0 Feb04 ?        00:05:02 postgres: port  5432, writer process                                                                                                     [ mdw] gpadmin  41846 41830  0 Feb04 ?        00:00:55 postgres: port  5432, checkpoint process                                                                                                 [ mdw] gpadmin  41847 41830  0 Feb04 ?        00:00:13 postgres: port  5432, seqserver process                                                                                                  [ mdw] gpadmin  41848 41830  0 Feb04 ?        00:04:33 postgres: port  5432, ftsprobe process                                                                                                   [ mdw] gpadmin  41851 41830  0 Feb04 ?        00:00:49 postgres: port  5432, sweeper process

执行gkill postgres,没反应。

直接使用kill 41830,没反应,不行。

 

执行pg_ctl stop -D /data/master/gpseg-1

>pg_ctl stop -D /data/master/gpseg-1

waiting for server to shut down .........................................failed

pg_ctl:server does not shut down

应是新版本中对于进程做了保护,不能直接kill了。

 

9、使用pg_ctl停库

gpadmin@mdw:~> pg_ctl --help   pg_ctl is a utility to start, stop, restart, reload configuration files,   report the status of a PostgreSQL server, or signal a PostgreSQL process.

Usage:     pg_ctl start   [-w] [-t SECS] [-D DATADIR] [-s] [-l FILENAME] [-o "OPTIONS"]     pg_ctl stop    [-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]     pg_ctl restart [-w] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]                    [-o "OPTIONS"]     pg_ctl reload  [-D DATADIR] [-s]     pg_ctl status  [-D DATADIR]     pg_ctl kill    SIGNALNAME PID

Common options:     -D, --pgdata DATADIR   location of the database storage area     -s, --silent           only print errors, no informational messages     -t SECS                seconds to wait when using -w option     -w                     wait until operation completes     -W                     do not wait until operation completes     --help                 show this help, then exit     --version              output version information, then exit     --gp-version           output Greenplum version information, then exit   (The default is to wait for shutdown, but not for start or restart.)

If the -D option is omitted, the environment variable PGDATA is used.

Options for start or restart:     -l, --log FILENAME     write (or append) server log to FILENAME     -o OPTIONS             command line options to pass to postgres                            (PostgreSQL server executable)     -p PATH-TO-POSTGRES    normally not necessary     -c, --core-files       allow postgres to produce core files

Options for stop or restart:     -m SHUTDOWN-MODE   can be "smart", "fast", or "immediate"

Shutdown modes are:     smart       quit after all clients have disconnected     fast        quit directly, with proper shutdown     immediate   quit without complete shutdown; will lead to recovery on restart

Allowed signal names for kill:     HUP INT QUIT ABRT TERM USR1 USR2

Report bugs to <pgsql-bugs@postgresql.org>.   gpadmin@mdw:~> pg_ctl stop -m immedidate   pg_ctl: unrecognized shutdown mode "immedidate"   Try "pg_ctl --help" for more information.   gpadmin@mdw:~> pg_ctl stop -D /data/master/gpseg-1 -m immedidate    pg_ctl: unrecognized shutdown mode "immedidate"   Try "pg_ctl --help" for more information.   gpadmin@mdw:~> pg_ctl stop -D /data/master/gpseg-1  -m immediate   waiting for server to shut down.... done   server stopped   gpadmin@mdw:~> ps -ef |grep post   root      4824     1  0 Feb03 ?        00:00:17 /usr/lib/postfix/master   postfix   4857  4824  0 Feb03 ?        00:00:01 qmgr -l -t fifo -u   gpadmin  21828 21731  0 18:47 pts/1    00:00:00 grep post

 

10、启动数据库

gpadmin@mdw:~> gpstart -m   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: -m   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.3.1 build 1'   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-Master-only start requested in a configuration with a standby master.   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-This is advisable only under the direct supervision of Greenplum support.    20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-This mode of operation is not supported in a production environment and    20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-may lead to a split-brain condition and possible unrecoverable data loss.   20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************

Continue with master-only startup Yy|Nn (default=N):   > y   20150227:18:47:13:021829 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode   20150227:18:47:15:021829 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information   20150227:18:47:15:021829 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...   20150227:18:47:16:021829 gpstart:mdw:gpadmin-[INFO]:-Setting new master era   20150227:18:47:16:021829 gpstart:mdw:gpadmin-[INFO]:-Master Started...   gpadmin@mdw:~> gpstop -af   20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Starting gpstop with args: -af   20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...   20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information   20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.3.1 build 1'   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-There are 0 connections to the database   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='fast'   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Master host=mdw   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Detected 0 connections to database   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Using standard WAIT mode of 600 seconds   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=fast   20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Master segment instance directory=/data/master/gpseg-1   20150227:18:47:30:021878 gpstop:mdw:gpadmin-[INFO]:-Stopping master standby host smdw mode=fast   20150227:18:47:30:021878 gpstop:mdw:gpadmin-[WARNING]:-Error occured while stopping the standby master: ExecutionError: 'non-zero rc: 1' occured.  Details: 'ssh -o 'StrictHostKeyChecking no' smdw ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/pg_ctl   -D /data/master/gpseg-1 -m fast -w -t 600 stop"'  cmd had rc=1 completed=True halted=False     stdout=''     stderr='pg_ctl: PID file "/data/master/gpseg-1/postmaster.pid" does not exist   Is server running?   '   20150227:18:47:30:021878 gpstop:mdw:gpadmin-[INFO]:-Successfully shutdown standby process on smdw   20150227:18:47:30:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing parallel primary segment instance shutdown, please wait...   ...........................................................................................................................................................................................................................................................................................................................................   ...............................................................................................................................................................................   .......................................................................................................    20150227:18:57:40:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing parallel mirror segment instance shutdown, please wait...   ..   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-------------------------------------------------   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-Failed Segment Stop Information    20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-------------------------------------------------   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:14  FAILED  host:'sdw4' datadir:'/data/primary/gpseg12' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:15  FAILED  host:'sdw4' datadir:'/data/primary/gpseg13' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:16  FAILED  host:'sdw4' datadir:'/data/primary/gpseg14' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:17  FAILED  host:'sdw4' datadir:'/data/primary/gpseg15' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:46  FAILED  host:'sdw6' datadir:'/data/mirror/gpseg16' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:22  FAILED  host:'sdw6' datadir:'/data/primary/gpseg20' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:23  FAILED  host:'sdw6' datadir:'/data/primary/gpseg21' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:24  FAILED  host:'sdw6' datadir:'/data/primary/gpseg22' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:25  FAILED  host:'sdw6' datadir:'/data/primary/gpseg23' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:47  FAILED  host:'sdw7' datadir:'/data/mirror/gpseg17' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:26  FAILED  host:'sdw7' datadir:'/data/primary/gpseg24' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:27  FAILED  host:'sdw7' datadir:'/data/primary/gpseg25' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:28  FAILED  host:'sdw7' datadir:'/data/primary/gpseg26' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:29  FAILED  host:'sdw7' datadir:'/data/primary/gpseg27' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:10  FAILED  host:'sdw3' datadir:'/data/primary/gpseg8' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:11  FAILED  host:'sdw3' datadir:'/data/primary/gpseg9' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:12  FAILED  host:'sdw3' datadir:'/data/primary/gpseg10' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:13  FAILED  host:'sdw3' datadir:'/data/primary/gpseg11' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:2  FAILED  host:'sdw1' datadir:'/data/primary/gpseg0' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:3  FAILED  host:'sdw1' datadir:'/data/primary/gpseg1' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:4  FAILED  host:'sdw1' datadir:'/data/primary/gpseg2' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:5  FAILED  host:'sdw1' datadir:'/data/primary/gpseg3' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:48  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg18' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:6  FAILED  host:'sdw2' datadir:'/data/primary/gpseg4' with reason:'cmd had rc=255 completed=True halted=False     stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003   -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1   20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments...   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown)   20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown)   20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown)   20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-   COMMAND RESULTS   STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................   failed    stderr: pg_ctl: server does not shut down

STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist   Is server running?

'     stderr='''   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:7  FAILED  host:'sdw2' datadir:'/data/primary/gpseg5' with reason:'cmd had rc=255 completed=True halted=False     stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003   -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1   20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments...   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown)   20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown)   20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown)   20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-   COMMAND RESULTS   STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................   failed    stderr: pg_ctl: server does not shut down

STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist   Is server running?

'     stderr='''   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:8  FAILED  host:'sdw2' datadir:'/data/primary/gpseg6' with reason:'cmd had rc=255 completed=True halted=False     stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003   -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1   20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments...   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown)   20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown)   20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown)   20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-   COMMAND RESULTS   STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................   failed    stderr: pg_ctl: server does not shut down

STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist   Is server running?

'     stderr='''   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:9  FAILED  host:'sdw2' datadir:'/data/primary/gpseg7' with reason:'cmd had rc=255 completed=True halted=False     stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003   -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1   20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments...   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown)   20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown)   20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown)   20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-   COMMAND RESULTS   STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................   failed    stderr: pg_ctl: server does not shut down

STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist   Is server running?

'     stderr='''   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:49  FAILED  host:'sdw2' datadir:'/data/mirror/gpseg19' with reason:'cmd had rc=255 completed=True halted=False     stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003   -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1   20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments...   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112   20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown)   20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown)   20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown)   20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-   COMMAND RESULTS   STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................   failed    stderr: pg_ctl: server does not shut down

STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist   Is server running?

STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout:  stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist   Is server running?

'     stderr='''   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:45  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg15' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:51  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg21' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:54  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg24' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:41  FAILED  host:'sdw7' datadir:'/data/mirror/gpseg11' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:44  FAILED  host:'sdw7' datadir:'/data/mirror/gpseg14' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:50  FAILED  host:'sdw7' datadir:'/data/mirror/gpseg20' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:37  FAILED  host:'sdw6' datadir:'/data/mirror/gpseg7' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:40  FAILED  host:'sdw6' datadir:'/data/mirror/gpseg10' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:43  FAILED  host:'sdw6' datadir:'/data/mirror/gpseg13' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:30  FAILED  host:'sdw2' datadir:'/data/mirror/gpseg0' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:52  FAILED  host:'sdw2' datadir:'/data/mirror/gpseg22' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:55  FAILED  host:'sdw2' datadir:'/data/mirror/gpseg25' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:32  FAILED  host:'sdw4' datadir:'/data/mirror/gpseg2' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:35  FAILED  host:'sdw4' datadir:'/data/mirror/gpseg5' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:38  FAILED  host:'sdw4' datadir:'/data/mirror/gpseg8' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:57  FAILED  host:'sdw4' datadir:'/data/mirror/gpseg27' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:31  FAILED  host:'sdw3' datadir:'/data/mirror/gpseg1' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:34  FAILED  host:'sdw3' datadir:'/data/mirror/gpseg4' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:53  FAILED  host:'sdw3' datadir:'/data/mirror/gpseg23' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:56  FAILED  host:'sdw3' datadir:'/data/mirror/gpseg26' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:33  FAILED  host:'sdw5' datadir:'/data/mirror/gpseg3' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:36  FAILED  host:'sdw5' datadir:'/data/mirror/gpseg6' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:39  FAILED  host:'sdw5' datadir:'/data/mirror/gpseg9' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:42  FAILED  host:'sdw5' datadir:'/data/mirror/gpseg12' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:18  FAILED  host:'sdw5' datadir:'/data/primary/gpseg16' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:19  FAILED  host:'sdw5' datadir:'/data/primary/gpseg17' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:20  FAILED  host:'sdw5' datadir:'/data/primary/gpseg18' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:21  FAILED  host:'sdw5' datadir:'/data/primary/gpseg19' with reason:'Shutdown failed'   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-   Segments stopped successfully                              = 0   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Segments with errors during stop                           = 45   <<<<<<<<   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-     20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Segments that are currently marked down in configuration   = 11   <<<<<<<<   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-            (stop was still attempted on these segments)   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-Successfully shutdown 0 of 56 segment instances <<<<<<<<   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-------------------------------------------------   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Segment instance shutdown failures reported   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Failed to shutdown 45 of 56 segment instances <<<<<   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-A total of 45 errors were encountered   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Review logfile /home/gpadmin/gpAdminLogs/gpstop_20150227.log   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-For more details on segment shutdown failure(s)   20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-------------------------------------------------   gpadmin@mdw:~>   gpadmin@mdw:~>   gpadmin@mdw:~> gpstart -a   20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: -a   20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...   20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.3.1 build 1'   20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'   20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode   20150227:18:58:43:022377 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information   20150227:18:58:43:022377 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...   20150227:18:58:44:022377 gpstart:mdw:gpadmin-[INFO]:-Setting new master era   20150227:18:58:44:022377 gpstart:mdw:gpadmin-[INFO]:-Master Started...   20150227:18:58:44:022377 gpstart:mdw:gpadmin-[INFO]:-Shutting down master   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/mirror/gpseg0 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg3 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg6 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg9 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg12 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg16 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg17 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg18 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg19 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/mirror/gpseg22 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/mirror/gpseg25 <<<<<   20150227:18:58:46:022377 gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...   ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................    20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-Process results...   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:9  FAILED  host:'sdw2' datadir:'/data/primary/gpseg7' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:7  FAILED  host:'sdw2' datadir:'/data/primary/gpseg5' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:6  FAILED  host:'sdw2' datadir:'/data/primary/gpseg4' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:54  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg24' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:51  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg21' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:45  FAILED  host:'sdw1' datadir:'/data/mirror/gpseg15' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:4  FAILED  host:'sdw1' datadir:'/data/primary/gpseg2' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:3  FAILED  host:'sdw1' datadir:'/data/primary/gpseg1' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:23  FAILED  host:'sdw6' datadir:'/data/primary/gpseg21' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:26  FAILED  host:'sdw7' datadir:'/data/primary/gpseg24' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:17  FAILED  host:'sdw4' datadir:'/data/primary/gpseg15' with reason:'Failure in segment mirroring; check segment logfile'   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------

20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-   Successful segment starts                                                     = 34   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Failed segment starts, from mirroring connection between primary and mirror   = 11   <<<<<<<<   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-   Other failed segment starts                                                   = 0   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)            = 11   <<<<<<<<   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-Successfully started 34 of 45 segment instances, skipped 11 other segments <<<<<<<<   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Segment instance startup failures reported   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Failed start 11 of 45 segment instances <<<<<<<<   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20150227.log   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-For more details on segment startup failure(s)   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Run  gpstate -s  to review current segment instance status   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-There are 11 segment(s) marked down in the database   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************   20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance mdw directory /data/master/gpseg-1    20150227:19:08:51:022377 gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active   20150227:19:09:05:022377 gpstart:mdw:gpadmin-[INFO]:-Starting standby master   20150227:19:09:05:022377 gpstart:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw  in directory: /data/master/gpseg-1   20150227:19:09:08:022377 gpstart:mdw:gpadmin-[WARNING]:-Number of segments which failed to start:  11

12、连接数据库,删除不需要的清单表

gpadmin@mdw:~> psql -d template1   psql (8.2.15)   Type "help" for help.

template1=# dn          List of schemas           Name        |  Owner    --------------------+---------  gp_toolkit         | gpadmin    information_schema | gpadmin    pg_aoseg           | gpadmin    pg_bitmapindex     | gpadmin    pg_catalog         | gpadmin    pg_toast           | gpadmin    public             | gpadmin   (7 rows)

template1=# l                     List of databases      Name    |  Owner  | Encoding |  Access privileges    -----------+---------+----------+---------------------  cems      | gpadmin | UTF8     |    dw        | gpadmin | UTF8     |    postgres  | gpadmin | UTF8     |    template0 | gpadmin | UTF8     | =c/gpadmin                                            : gpadmin=CTc/gpadmin    template1 | gpadmin | UTF8     | =c/gpadmin                                            : gpadmin=CTc/gpadmin   (5 rows)

 

13、gprecoverseg恢复

gprecoverseg   Continue with segment recovery procedure Yy|Nn (default=N):   > Y   20150227:22:43:00:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15 segment(s) to recover   20150227:22:43:00:035973 gprecoverseg:mdw:gpadmin-[INFO]:-Ensuring 15 failed segment(s) are stopped   20150227:22:43:01:035973 gprecoverseg:mdw:gpadmin-[INFO]:-38594: /data/mirror/gpseg0   20150227:22:43:03:035973 gprecoverseg:mdw:gpadmin-[INFO]:-41988: /data/mirror/gpseg1   20150227:22:43:04:035973 gprecoverseg:mdw:gpadmin-[INFO]:-1525: /data/mirror/gpseg2   20150227:22:43:05:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15788: /data/mirror/gpseg3   20150227:22:43:06:035973 gprecoverseg:mdw:gpadmin-[INFO]:-41987: /data/mirror/gpseg4   20150227:22:43:08:035973 gprecoverseg:mdw:gpadmin-[INFO]:-1524: /data/mirror/gpseg5   20150227:22:43:09:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15787: /data/mirror/gpseg6   20150227:22:43:10:035973 gprecoverseg:mdw:gpadmin-[INFO]:-30986: /data/mirror/gpseg7   20150227:22:43:11:035973 gprecoverseg:mdw:gpadmin-[INFO]:-2730: /data/mirror/gpseg15   20150227:22:43:12:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15786: /data/primary/gpseg18   20150227:22:43:14:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15785: /data/primary/gpseg19   20150227:22:43:15:035973 gprecoverseg:mdw:gpadmin-[INFO]:-2729: /data/mirror/gpseg21   20150227:22:43:16:035973 gprecoverseg:mdw:gpadmin-[INFO]:-38595: /data/mirror/gpseg22   20150227:22:43:17:035973 gprecoverseg:mdw:gpadmin-[INFO]:-2728: /data/mirror/gpseg24   20150227:22:43:19:035973 gprecoverseg:mdw:gpadmin-[INFO]:-38596: /data/mirror/gpseg25       20150227:22:43:23:035973 gprecoverseg:mdw:gpadmin-[INFO]:-Cleaning files from 15 segment(s)   ...........................................................................................................................................................................................................................................................................................................................................................................................................................................    20150227:22:50:31:035973 gprecoverseg:mdw:gpadmin-[INFO]:-Building template directory   20150227:22:50:31:035973 gprecoverseg:mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1    Command was: 'ssh -o 'StrictHostKeyChecking no' sdw1 ". /usr/local/greenplum-db/./greenplum_path.sh; /usr/bin/scp -o 'StrictHostKeyChecking no' -r /data/primary/gpseg0/postgresql.conf mdw:/data/master/gpbuildingsegment_02272015_35973/schema"'   rc=1, stdout='', stderr='ssh: Could not resolve hostname mdw: Name or service not known   lost connection

报错,20150227:22:50:31:035973 gprecoverseg:mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1    Command was: 'ssh -o 'StrictHostKeyChecking no' sdw1 ". /usr/local/greenplum-db/./greenplum_path.sh; /usr/bin/scp -o 'StrictHostKeyChecking no' -r /data/primary/gpseg0/postgresql.conf mdw:/data/master/gpbuildingsegment_02272015_35973/schema"'   rc=1, stdout='', stderr='ssh: Could not resolve hostname mdw: Name or service not known   lost connection

经分析,应是hosts有问题了,直接ssh hosts的域名不行,直接ssh ip地址没有问题。

经检查,segment上的hosts有问题,和hadoop集群的hosts一样了。应该hadoop集群是后安装的,当时我要同事帮忙scp修改的hosts,估计把gp的集群也覆盖了。

修改hosts完后,

再执行gprecoverseg命令

gpadmin@mdw:~> gpstate -m   20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m   20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.3.1 build 1'   20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.3.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct 10 2014 14:31:50'   20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--Type = Spread   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   Mirror   Datadir                Port    Status              Data Status          20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data/mirror/gpseg0    50000   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data/mirror/gpseg1    50001   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw4     /data/mirror/gpseg2    50002   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw5     /data/mirror/gpseg3    50003   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data/mirror/gpseg4    50000   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw4     /data/mirror/gpseg5    50001   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw5     /data/mirror/gpseg6    50002   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw6     /data/mirror/gpseg7    50003   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw4     /data/mirror/gpseg8    50000   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw5     /data/mirror/gpseg9    50001   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw6     /data/mirror/gpseg10   50002   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw7     /data/mirror/gpseg11   50003   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw5     /data/mirror/gpseg12   50000   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw6     /data/mirror/gpseg13   50001   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw7     /data/mirror/gpseg14   50002   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data/mirror/gpseg15   50003   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw6     /data/mirror/gpseg16   50000   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw7     /data/mirror/gpseg17   50001   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data/mirror/gpseg18   50002   Acting as Primary   Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data/mirror/gpseg19   50003   Acting as Primary   Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw7     /data/mirror/gpseg20   50000   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data/mirror/gpseg21   50001   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data/mirror/gpseg22   50002   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data/mirror/gpseg23   50003   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data/mirror/gpseg24   50000   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data/mirror/gpseg25   50001   Passive             Resynchronizing   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data/mirror/gpseg26   50002   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-   sdw4     /data/mirror/gpseg27   50003   Passive             Synchronized   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------   20150228:11:43:20:019954 gpstate:mdw:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) are acting as primaries

恢复正常

posted @ 2017-06-30 11:57  星火spark  阅读(5276)  评论(0编辑  收藏  举报