gp数据库停止
greenplum是2(master)+7(segment)的集群规模
系统刚准备上线,是用来做统计数据库的,正在帮忙一个hadoop集群核对其数据的准确性,在这个greenplum库中入了清单数据
后检查分析是部分建表语句存在问题,没有指定字段做分布键,也没有指定其是随机分布,导致默认为第一个字段做为分布键导致数据倾斜。
发现数据库非常慢,几乎是不可用,检查greenplum的状态情况
1、检查greenplum数据库的状态
gpadmin@mdw:~> gpstate 20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: 20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.3.1 build 1' 20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.3.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct 10 2014 14:31:50' 20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20150227:15:20:13:007202 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments... ...... 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:-Greenplum instance status summary 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Master instance = Active 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Master standby = smdw 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Standby master state = Standby host passive 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total segment instance count from metadata = 56 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Primary Segment Status 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total primary segments = 28 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total primary segment valid (at master) = 24 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[WARNING]:-Total primary segment failures (at master) = 4 <<<<<<<< 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid files found = 28 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 28 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of /tmp lock files found = 28 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[WARNING]:-Total number postmaster processes missing = 4 <<<<<<<< 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:- Total number postmaster processes found = 24 20150227:15:20:19:007202 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Mirror Segment Status 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total mirror segments = 28 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total mirror segment valid (at master) = 21 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[WARNING]:-Total mirror segment failures (at master) = 7 <<<<<<<< 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid files found = 28 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 28 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number of /tmp lock files found = 28 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[WARNING]:-Total number postmaster processes missing = 4 <<<<<<<< 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number postmaster processes found = 24 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[WARNING]:-Total number mirror segments acting as primary segments = 4 <<<<<<<< 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 24 20150227:15:20:20:007202 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
2、检查数据库服务器的情况
gpadmin@mdw:/> gpssh -h sdw1 -h sdw2 -h sdw3 -h sdw4 -h sdw5 -h sdw6 -h sdw7 "df -hT" [sdw4] df: "/root/.gvfs": 权限不够 [sdw4] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw4] /dev/sda2 ext3 99G 5.7G 88G 7% / [sdw4] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw4] tmpfs tmpfs 95G 100K 95G 1% /dev/shm [sdw4] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw4] /dev/sda5 ext3 197G 188M 187G 1% /home [sdw4] /dev/sdb xfs 4.6T 3.8T 865G 82% /data [sdw5] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw5] /dev/sda2 ext3 60G 5.5G 51G 10% / [sdw5] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw5] tmpfs tmpfs 32G 88K 32G 1% /dev/shm [sdw5] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw5] /dev/sda5 ext3 785G 197M 745G 1% /home [sdw5] /dev/sdb xfs 4.6T 2.4T 2.2T 53% /data [sdw6] df: "/root/.gvfs": 权限不够 [sdw6] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw6] /dev/sda2 ext3 99G 910M 93G 1% / [sdw6] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw6] tmpfs tmpfs 47G 100K 47G 1% /dev/shm [sdw6] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw6] /dev/sda5 ext3 197G 188M 187G 1% /home [sdw6] /dev/sda3 ext3 63G 5.0G 55G 9% /usr [sdw6] /dev/sdb xfs 4.6T 4.5T 93G 99% /data [sdw6] /dev/sr0 iso9660 3.1G 3.1G 0 100% /media/SLES-11-SP2-DVD-x86_6407551 [sdw7] df: "/root/.gvfs": 权限不够 [sdw7] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw7] /dev/sda2 ext3 60G 440M 56G 1% / [sdw7] devtmpfs devtmpfs 32G 244K 32G 1% /dev [sdw7] tmpfs tmpfs 95G 112K 95G 1% /dev/shm [sdw7] /dev/sda1 ext3 9.9G 180M 9.2G 2% /boot [sdw7] /dev/sda5 ext3 197G 188M 187G 1% /home [sdw7] /dev/sda6 ext3 9.9G 264M 9.1G 3% /opt [sdw7] /dev/sda8 ext3 9.9G 151M 9.2G 2% /srv [sdw7] /dev/sda7 ext3 9.9G 162M 9.2G 2% /tmp [sdw7] /dev/sda9 ext3 40G 4.8G 33G 13% /usr [sdw7] /dev/sda10 ext3 9.9G 358M 9.0G 4% /var [sdw7] /dev/sdb xfs 4.6T 3.7T 943G 80% /data [sdw1] df: "/root/.gvfs": 权限不够 [sdw1] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw1] /dev/sda1 ext3 99G 5.7G 88G 7% / [sdw1] devtmpfs devtmpfs 32G 444K 32G 1% /dev [sdw1] tmpfs tmpfs 95G 100K 95G 1% /dev/shm [sdw1] /dev/sda2 ext3 7.9G 216M 7.3G 3% /boot [sdw1] /dev/sda3 ext3 197G 188M 187G 1% /home [sdw1] /dev/sdb xfs 4.6T 3.6T 1.1T 78% /data [sdw2] df: "/root/.gvfs": 权限不够 [sdw2] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw2] /dev/sda1 ext3 99G 5.7G 88G 7% / [sdw2] devtmpfs devtmpfs 32G 444K 32G 1% /dev [sdw2] tmpfs tmpfs 95G 100K 95G 1% /dev/shm [sdw2] /dev/sda2 ext3 7.9G 216M 7.3G 3% /boot [sdw2] /dev/sda3 ext3 197G 188M 187G 1% /home [sdw2] /dev/sdb xfs 4.6T 4.6T 2.0G 100% /data [sdw3] df: "/root/.gvfs": 权限不够 [sdw3] 文件系统 类型 容量 已用 可用 已用% 挂载瀿 [sdw3] /dev/sda2 ext3 60G 5.5G 51G 10% / [sdw3] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw3] tmpfs tmpfs 32G 100K 32G 1% /dev/shm [sdw3] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw3] /dev/sda5 ext3 785G 197M 745G 1% /home [sdw3] /dev/sdb xfs 4.6T 3.8T 856G 82% /data 发现segment中的sdw2和sdw6的数据空间/data目录的收益率已经达到100%。
3、数据库上删除清单表
用psql连接数据库,执行drop table 表名,执行后,等待大半天也没反应,使用gpssh -f检查服务器的io使用情况,segment服务器的IO没有读写操作,证明数据库已经没办法分发命令下去。
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sr0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sr0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sr0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sr0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4、分别登陆sdw2和sdw6
#cd /data/primary/gpseg4/gp_log
#ls -ltrh
目录下有gpdb-日期.csv文件,直接执行
#rm -rf *.csv
每个gpseg下的gp_log都执行同样的删除操作,执行完后,空间释放得非常少。
=> df -hT [sdw4] df: "/root/.gvfs": 权限不够 [sdw4] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw4] /dev/sda2 ext3 99G 5.7G 88G 7% / [sdw4] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw4] tmpfs tmpfs 95G 100K 95G 1% /dev/shm [sdw4] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw4] /dev/sda5 ext3 197G 188M 187G 1% /home [sdw4] /dev/sdb xfs 4.6T 3.7T 868G 82% /data [sdw5] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw5] /dev/sda2 ext3 60G 5.5G 51G 10% / [sdw5] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw5] tmpfs tmpfs 32G 88K 32G 1% /dev/shm [sdw5] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw5] /dev/sda5 ext3 785G 197M 745G 1% /home [sdw5] /dev/sdb xfs 4.6T 2.4T 2.2T 53% /data [sdw6] df: "/root/.gvfs": 权限不够 [sdw6] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw6] /dev/sda2 ext3 99G 911M 93G 1% / [sdw6] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw6] tmpfs tmpfs 47G 100K 47G 1% /dev/shm [sdw6] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw6] /dev/sda5 ext3 197G 188M 187G 1% /home [sdw6] /dev/sda3 ext3 63G 5.0G 55G 9% /usr [sdw6] /dev/sdb xfs 4.6T 4.5T 96G 98% /data [sdw6] /dev/sr0 iso9660 3.1G 3.1G 0 100% /media/SLES-11-SP2-DVD-x86_6407551 [sdw7] df: "/root/.gvfs": 权限不够 [sdw7] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw7] /dev/sda2 ext3 60G 440M 56G 1% / [sdw7] devtmpfs devtmpfs 32G 244K 32G 1% /dev [sdw7] tmpfs tmpfs 95G 112K 95G 1% /dev/shm [sdw7] /dev/sda1 ext3 9.9G 180M 9.2G 2% /boot [sdw7] /dev/sda5 ext3 197G 188M 187G 1% /home [sdw7] /dev/sda6 ext3 9.9G 264M 9.1G 3% /opt [sdw7] /dev/sda8 ext3 9.9G 151M 9.2G 2% /srv [sdw7] /dev/sda7 ext3 9.9G 162M 9.2G 2% /tmp [sdw7] /dev/sda9 ext3 40G 4.8G 33G 13% /usr [sdw7] /dev/sda10 ext3 9.9G 359M 9.0G 4% /var [sdw7] /dev/sdb xfs 4.6T 3.7T 945G 80% /data [sdw1] df: "/root/.gvfs": 权限不够 [sdw1] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw1] /dev/sda1 ext3 99G 5.7G 88G 7% / [sdw1] devtmpfs devtmpfs 32G 444K 32G 1% /dev [sdw1] tmpfs tmpfs 95G 100K 95G 1% /dev/shm [sdw1] /dev/sda2 ext3 7.9G 216M 7.3G 3% /boot [sdw1] /dev/sda3 ext3 197G 188M 187G 1% /home [sdw1] /dev/sdb xfs 4.6T 3.6T 1.1T 78% /data [sdw2] df: "/root/.gvfs": 权限不够 [sdw2] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw2] /dev/sda1 ext3 99G 5.7G 88G 7% / [sdw2] devtmpfs devtmpfs 32G 444K 32G 1% /dev [sdw2] tmpfs tmpfs 95G 100K 95G 1% /dev/shm [sdw2] /dev/sda2 ext3 7.9G 216M 7.3G 3% /boot [sdw2] /dev/sda3 ext3 197G 188M 187G 1% /home [sdw2] /dev/sdb xfs 4.6T 4.6T 21G 100% /data [sdw3] df: "/root/.gvfs": 权限不够 [sdw3] 文件系统 类型 容量 已用 可用 已用% 挂载点 [sdw3] /dev/sda2 ext3 60G 5.5G 51G 10% / [sdw3] devtmpfs devtmpfs 32G 448K 32G 1% /dev [sdw3] tmpfs tmpfs 32G 100K 32G 1% /dev/shm [sdw3] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [sdw3] /dev/sda5 ext3 785G 198M 745G 1% /home [sdw3] /dev/sdb xfs 4.6T 3.8T 859G 82% /data [smdw] df: "/root/.gvfs": 权限不够 [smdw] 文件系统 类型 容量 已用 可用 已用% 挂载点 [smdw] /dev/sda2 ext3 60G 5.8G 51G 11% / [smdw] devtmpfs devtmpfs 32G 448K 32G 1% /dev [smdw] tmpfs tmpfs 32G 100K 32G 1% /dev/shm [smdw] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [smdw] /dev/sda5 ext3 40G 23G 16G 60% /home [smdw] /dev/sda6 xfs 757G 3.3G 754G 1% /data [smdw] /dev/sr0 iso9660 3.1G 3.1G 0 100% /media/SLES-11-SP2-DVD-x86_6407551 [ mdw] df: "/root/.gvfs": 权限不够 [ mdw] 文件系统 类型 容量 已用 可用 已用% 挂载点 [ mdw] /dev/sda2 ext3 60G 6.4G 50G 12% / [ mdw] devtmpfs devtmpfs 32G 456K 32G 1% /dev [ mdw] tmpfs tmpfs 32G 100K 32G 1% /dev/shm [ mdw] /dev/sda1 ext3 9.9G 220M 9.2G 3% /boot [ mdw] /dev/sda5 ext3 40G 281M 38G 1% /home [ mdw] /dev/sda6 xfs 757G 6.4G 751G 1% /data
5、停数据库
执行gpstop
报:-gpstop failed(Reason='FATAL: the database system is shutting down') exiting ...
gpstop停不下来。
6、检查进程
gpssh -h sdw1 -h sdw2 -h sdw3 -h sdw4 -h sdw5 -h sdw6 -h sdw7 -h mdw -h smdw "ps -ef |grep postgres"
数据库segment上进程没有停。下面是一台segment上的进程:
[sdw4] gpadmin 15168 31843 0 Feb26 ? 00:00:14 postgres: port 40000, cems cems 198.168.11.11(52166) con6959 seg12 idle in transaction [sdw4] gpadmin 15170 31838 0 Feb26 ? 00:00:15 postgres: port 40001, cems cems 198.168.11.11(38065) con6959 seg13 idle in transaction [sdw4] gpadmin 15172 31841 0 Feb26 ? 00:00:16 postgres: port 40002, cems cems 198.168.11.11(3175) con6959 seg14 idle in transaction [sdw4] gpadmin 15174 31837 0 Feb26 ? 00:00:16 postgres: port 40003, cems cems 198.168.11.11(31152) con6959 seg15 idle in transaction [sdw4] gpadmin 15176 31843 0 Feb26 ? 00:00:04 postgres: port 40000, cems cems 198.168.11.11(52194) con6959 seg12 idle [sdw4] gpadmin 15178 31838 0 Feb26 ? 00:00:04 postgres: port 40001, cems cems 198.168.11.11(38093) con6959 seg13 idle [sdw4] gpadmin 15180 31841 0 Feb26 ? 00:00:04 postgres: port 40002, cems cems 198.168.11.11(3203) con6959 seg14 idle [sdw4] gpadmin 15182 31837 0 Feb26 ? 00:00:04 postgres: port 40003, cems cems 198.168.11.11(31180) con6959 seg15 idle [sdw4] gpadmin 15204 31843 0 Feb26 ? 00:00:16 postgres: port 40000, cems cems 198.168.11.11(52298) con6949 seg12 idle in transaction [sdw4] gpadmin 15206 31838 0 Feb26 ? 00:00:15 postgres: port 40001, cems cems 198.168.11.11(38197) con6949 seg13 idle in transaction [sdw4] gpadmin 15208 31841 0 Feb26 ? 00:00:16 postgres: port 40002, cems cems 198.168.11.11(3307) con6949 seg14 idle in transaction [sdw4] gpadmin 15210 31837 0 Feb26 ? 00:00:15 postgres: port 40003, cems cems 198.168.11.11(31284) con6949 seg15 idle in transaction [sdw4] gpadmin 31836 1 0 Feb04 ? 00:00:04 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg27 -p 50003 -b 57 -z 28 --silent-mode=true -i -M quiescent -C 27 [sdw4] gpadmin 31837 1 0 Feb04 ? 00:10:13 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg15 -p 40003 -b 17 -z 28 --silent-mode=true -i -M quiescent -C 15 [sdw4] gpadmin 31838 1 0 Feb04 ? 00:10:11 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg13 -p 40001 -b 15 -z 28 --silent-mode=true -i -M quiescent -C 13 [sdw4] gpadmin 31839 1 0 Feb04 ? 00:00:05 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg5 -p 50001 -b 35 -z 28 --silent-mode=true -i -M quiescent -C 5 [sdw4] gpadmin 31840 1 0 Feb04 ? 00:00:03 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg2 -p 50002 -b 32 -z 28 --silent-mode=true -i -M quiescent -C 2 [sdw4] gpadmin 31841 1 0 Feb04 ? 00:10:20 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg14 -p 40002 -b 16 -z 28 --silent-mode=true -i -M quiescent -C 14 [sdw4] gpadmin 31842 1 0 Feb04 ? 00:00:04 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/mirror/gpseg8 -p 50000 -b 38 -z 28 --silent-mode=true -i -M quiescent -C 8 [sdw4] gpadmin 31843 1 0 Feb04 ? 00:10:26 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/primary/gpseg12 -p 40000 -b 14 -z 28 --silent-mode=true -i -M quiescent -C 12 [sdw4] gpadmin 31844 31841 0 Feb04 ? 00:01:15 postgres: port 40002, logger process [sdw4] gpadmin 31845 31838 0 Feb04 ? 00:01:21 postgres: port 40001, logger process [sdw4] gpadmin 31846 31839 0 Feb04 ? 00:01:07 postgres: port 50001, logger process [sdw4] gpadmin 31847 31836 0 Feb04 ? 00:01:13 postgres: port 50003, logger process [sdw4] gpadmin 31848 31843 0 Feb04 ? 00:01:19 postgres: port 40000, logger process [sdw4] gpadmin 31849 31842 0 Feb04 ? 00:01:07 postgres: port 50000, logger process [sdw4] gpadmin 31850 31837 0 Feb04 ? 00:01:25 postgres: port 40003, logger process [sdw4] gpadmin 31851 31840 0 Feb04 ? 00:01:04 postgres: port 50002, logger process [sdw4] gpadmin 31868 31836 0 Feb04 ? 00:08:55 postgres: port 50003, mirror process [sdw4] gpadmin 31871 31837 0 Feb04 ? 00:09:13 postgres: port 40003, primary process [sdw4] gpadmin 31873 31841 0 Feb04 ? 00:08:56 postgres: port 40002, primary process [sdw4] gpadmin 31874 31868 0 Feb04 ? 01:10:10 postgres: port 50003, mirror receiver process [sdw4] gpadmin 31876 31868 0 Feb04 ? 00:34:17 postgres: port 50003, mirror consumer process [sdw4] gpadmin 31877 31868 0 Feb04 ? 00:12:24 postgres: port 50003, mirror consumer writer process [sdw4] gpadmin 31878 31868 0 Feb04 ? 00:36:29 postgres: port 50003, mirror consumer append only process [sdw4] gpadmin 31879 31838 0 Feb04 ? 00:08:55 postgres: port 40001, primary process [sdw4] gpadmin 31881 31868 0 Feb04 ? 00:07:44 postgres: port 50003, mirror sender ack process [sdw4] gpadmin 31882 31868 0 Feb04 ? 00:00:03 postgres: port 50003, mirror verification process [sdw4] gpadmin 31883 31871 0 Feb04 ? 00:07:12 postgres: port 40003, primary receiver ack process [sdw4] gpadmin 31885 31871 0 Feb04 ? 01:15:22 postgres: port 40003, primary sender process [sdw4] gpadmin 31886 31871 0 Feb04 ? 00:07:00 postgres: port 40003, primary consumer ack process [sdw4] gpadmin 31887 31871 0 Feb04 ? 00:15:41 postgres: port 40003, primary recovery process [sdw4] gpadmin 31888 31842 0 Feb04 ? 00:09:02 postgres: port 50000, mirror process [sdw4] gpadmin 31889 31871 0 Feb04 ? 00:01:08 postgres: port 40003, primary verification process [sdw4] gpadmin 31892 31873 0 Feb04 ? 00:07:19 postgres: port 40002, primary receiver ack process [sdw4] gpadmin 31893 31873 0 Feb04 ? 01:10:48 postgres: port 40002, primary sender process [sdw4] gpadmin 31894 31873 0 Feb04 ? 00:06:47 postgres: port 40002, primary consumer ack process [sdw4] gpadmin 31895 31873 0 Feb04 ? 00:15:34 postgres: port 40002, primary recovery process [sdw4] gpadmin 31896 31873 0 Feb04 ? 00:01:13 postgres: port 40002, primary verification process [sdw4] gpadmin 31898 31839 0 Feb04 ? 00:09:08 postgres: port 50001, mirror process [sdw4] gpadmin 31900 31840 0 Feb04 ? 00:09:03 postgres: port 50002, mirror process [sdw4] gpadmin 31901 31879 0 Feb04 ? 00:07:05 postgres: port 40001, primary receiver ack process [sdw4] gpadmin 31902 31879 0 Feb04 ? 01:07:19 postgres: port 40001, primary sender process [sdw4] gpadmin 31903 31879 0 Feb04 ? 00:09:53 postgres: port 40001, primary consumer ack process [sdw4] gpadmin 31904 31879 0 Feb04 ? 00:15:36 postgres: port 40001, primary recovery process [sdw4] gpadmin 31905 31879 0 Feb04 ? 00:01:13 postgres: port 40001, primary verification process [sdw4] gpadmin 31911 31888 0 Feb04 ? 01:11:45 postgres: port 50000, mirror receiver process [sdw4] gpadmin 31912 31888 0 Feb04 ? 00:34:47 postgres: port 50000, mirror consumer process [sdw4] gpadmin 31913 31898 0 Feb04 ? 01:19:23 postgres: port 50001, mirror receiver process [sdw4] gpadmin 31914 31888 0 Feb04 ? 00:13:06 postgres: port 50000, mirror consumer writer process [sdw4] gpadmin 31915 31898 0 Feb04 ? 00:38:56 postgres: port 50001, mirror consumer process [sdw4] gpadmin 31916 31888 0 Feb04 ? 00:36:17 postgres: port 50000, mirror consumer append only process [sdw4] gpadmin 31917 31898 0 Feb04 ? 00:19:13 postgres: port 50001, mirror consumer writer process [sdw4] gpadmin 31919 31898 0 Feb04 ? 00:36:00 postgres: port 50001, mirror consumer append only process [sdw4] gpadmin 31920 31888 0 Feb04 ? 00:07:49 postgres: port 50000, mirror sender ack process [sdw4] gpadmin 31922 31888 0 Feb04 ? 00:00:03 postgres: port 50000, mirror verification process [sdw4] gpadmin 31923 31898 0 Feb04 ? 00:08:11 postgres: port 50001, mirror sender ack process [sdw4] gpadmin 31924 31898 0 Feb04 ? 00:00:03 postgres: port 50001, mirror verification process [sdw4] gpadmin 31925 31900 0 Feb04 ? 01:05:21 postgres: port 50002, mirror receiver process [sdw4] gpadmin 31926 31900 0 Feb04 ? 00:34:34 postgres: port 50002, mirror consumer process [sdw4] gpadmin 31927 31900 0 Feb04 ? 00:12:25 postgres: port 50002, mirror consumer writer process [sdw4] gpadmin 31928 31900 0 Feb04 ? 00:36:15 postgres: port 50002, mirror consumer append only process [sdw4] gpadmin 31930 31900 0 Feb04 ? 00:07:48 postgres: port 50002, mirror sender ack process [sdw4] gpadmin 31931 31900 0 Feb04 ? 00:00:03 postgres: port 50002, mirror verification process [sdw4] gpadmin 31937 31843 0 Feb04 ? 00:00:40 postgres: port 40000, stats collector process [sdw4] gpadmin 31938 31843 0 Feb04 ? 00:07:18 postgres: port 40000, writer process [sdw4] gpadmin 31939 31843 0 Feb04 ? 00:02:02 postgres: port 40000, checkpoint process [sdw4] gpadmin 31940 31843 0 Feb04 ? 00:01:52 postgres: port 40000, sweeper process [sdw4] gpadmin 31944 31837 0 Feb04 ? 00:00:40 postgres: port 40003, stats collector process [sdw4] gpadmin 31945 31837 0 Feb04 ? 00:07:55 postgres: port 40003, writer process [sdw4] gpadmin 31946 31837 0 Feb04 ? 00:03:28 postgres: port 40003, checkpoint process [sdw4] gpadmin 31947 31837 0 Feb04 ? 00:01:50 postgres: port 40003, sweeper process [sdw4] gpadmin 31948 31841 0 Feb04 ? 00:00:36 postgres: port 40002, stats collector process [sdw4] gpadmin 31949 31841 0 Feb04 ? 00:08:00 postgres: port 40002, writer process [sdw4] gpadmin 31950 31841 0 Feb04 ? 00:02:49 postgres: port 40002, checkpoint process [sdw4] gpadmin 31951 31841 0 Feb04 ? 00:01:52 postgres: port 40002, sweeper process [sdw4] gpadmin 31952 31838 0 Feb04 ? 00:00:36 postgres: port 40001, stats collector process [sdw4] gpadmin 31953 31838 0 Feb04 ? 00:07:39 postgres: port 40001, writer process [sdw4] gpadmin 31954 31838 0 Feb04 ? 00:02:36 postgres: port 40001, checkpoint process [sdw4] gpadmin 31955 31838 0 Feb04 ? 00:01:47 postgres: port 40001, sweeper process [sdw4] gpadmin 37706 37670 0 17:52 pts/0 00:00:00 grep postgres [sdw4] gpadmin 42312 31843 0 Feb12 ? 00:02:50 postgres: port 40000, primary process [sdw4] gpadmin 42313 42312 0 Feb12 ? 00:09:58 postgres: port 40000, primary recovery process
7、在master上执行
#gpstop -af
报:-gpstop failed(Reason='FATAL: the database system is shutting down') exiting ...
同样停不了库
gpadmin@mdw:~> gpstop -M fast 20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Starting gpstop with args: -M fast 20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20150227:18:42:01:021768 gpstop:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20150227:18:42:01:021768 gpstop:mdw:gpadmin-[CRITICAL]:-gpstop failed. (Reason='FATAL: the database system is shutting down ') exiting...
同样不行。
8、直接kill
gpssh -h sdw1 -h sdw2 -h sdw3 -h sdw4 -h sdw5 -h sdw6 -h sdw7 -h mdw -h smdw "for i in `ps -ef |grep postgres |awk {'print $2'}`;do kill $i; done"
使用上面的检查,segment上的进程都已停止,但是master上的部分进程没有停。
9、停止master进程
mdw -h smdw ps -ef |grep postgres [ mdw] gpadmin 15765 15724 0 17:52 pts/11 00:00:00 grep postgres [ mdw] gpadmin 38876 41830 0 Feb26 ? 00:05:11 postgres: port 5432, cems cems 198.168.11.12(51943) con6959 198.168.11.12(51943) cmd24705 BIND [ mdw] gpadmin 41830 1 0 Feb04 ? 00:00:07 /usr/local/greenplum-db-4.3.3.1/bin/postgres -D /data/master/gpseg-1 -p 5432 -b 1 -z 28 --silent-mode=true -i -M master -C -1 -x 58 -E [ mdw] gpadmin 41838 41830 0 Feb04 ? 00:10:18 postgres: port 5432, master logger process [ mdw] gpadmin 41844 41830 0 Feb04 ? 00:00:18 postgres: port 5432, stats collector process [ mdw] gpadmin 41845 41830 0 Feb04 ? 00:05:02 postgres: port 5432, writer process [ mdw] gpadmin 41846 41830 0 Feb04 ? 00:00:55 postgres: port 5432, checkpoint process [ mdw] gpadmin 41847 41830 0 Feb04 ? 00:00:13 postgres: port 5432, seqserver process [ mdw] gpadmin 41848 41830 0 Feb04 ? 00:04:33 postgres: port 5432, ftsprobe process [ mdw] gpadmin 41851 41830 0 Feb04 ? 00:00:49 postgres: port 5432, sweeper process
执行gkill postgres,没反应。
直接使用kill 41830,没反应,不行。
执行pg_ctl stop -D /data/master/gpseg-1
>pg_ctl stop -D /data/master/gpseg-1
waiting for server to shut down .........................................failed
pg_ctl:server does not shut down
应是新版本中对于进程做了保护,不能直接kill了。
9、使用pg_ctl停库
gpadmin@mdw:~> pg_ctl --help pg_ctl is a utility to start, stop, restart, reload configuration files, report the status of a PostgreSQL server, or signal a PostgreSQL process.
Usage: pg_ctl start [-w] [-t SECS] [-D DATADIR] [-s] [-l FILENAME] [-o "OPTIONS"] pg_ctl stop [-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE] pg_ctl restart [-w] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE] [-o "OPTIONS"] pg_ctl reload [-D DATADIR] [-s] pg_ctl status [-D DATADIR] pg_ctl kill SIGNALNAME PID
Common options: -D, --pgdata DATADIR location of the database storage area -s, --silent only print errors, no informational messages -t SECS seconds to wait when using -w option -w wait until operation completes -W do not wait until operation completes --help show this help, then exit --version output version information, then exit --gp-version output Greenplum version information, then exit (The default is to wait for shutdown, but not for start or restart.)
If the -D option is omitted, the environment variable PGDATA is used.
Options for start or restart: -l, --log FILENAME write (or append) server log to FILENAME -o OPTIONS command line options to pass to postgres (PostgreSQL server executable) -p PATH-TO-POSTGRES normally not necessary -c, --core-files allow postgres to produce core files
Options for stop or restart: -m SHUTDOWN-MODE can be "smart", "fast", or "immediate"
Shutdown modes are: smart quit after all clients have disconnected fast quit directly, with proper shutdown immediate quit without complete shutdown; will lead to recovery on restart
Allowed signal names for kill: HUP INT QUIT ABRT TERM USR1 USR2
Report bugs to <pgsql-bugs@postgresql.org>. gpadmin@mdw:~> pg_ctl stop -m immedidate pg_ctl: unrecognized shutdown mode "immedidate" Try "pg_ctl --help" for more information. gpadmin@mdw:~> pg_ctl stop -D /data/master/gpseg-1 -m immedidate pg_ctl: unrecognized shutdown mode "immedidate" Try "pg_ctl --help" for more information. gpadmin@mdw:~> pg_ctl stop -D /data/master/gpseg-1 -m immediate waiting for server to shut down.... done server stopped gpadmin@mdw:~> ps -ef |grep post root 4824 1 0 Feb03 ? 00:00:17 /usr/lib/postfix/master postfix 4857 4824 0 Feb03 ? 00:00:01 qmgr -l -t fifo -u gpadmin 21828 21731 0 18:47 pts/1 00:00:00 grep post
10、启动数据库
gpadmin@mdw:~> gpstart -m 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: -m 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.3.1 build 1' 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-Master-only start requested in a configuration with a standby master. 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-This is advisable only under the direct supervision of Greenplum support. 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-This mode of operation is not supported in a production environment and 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-may lead to a split-brain condition and possible unrecoverable data loss. 20150227:18:47:10:021829 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************
Continue with master-only startup Yy|Nn (default=N): > y 20150227:18:47:13:021829 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode 20150227:18:47:15:021829 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20150227:18:47:15:021829 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20150227:18:47:16:021829 gpstart:mdw:gpadmin-[INFO]:-Setting new master era 20150227:18:47:16:021829 gpstart:mdw:gpadmin-[INFO]:-Master Started... gpadmin@mdw:~> gpstop -af 20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Starting gpstop with args: -af 20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20150227:18:47:28:021878 gpstop:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.3.1 build 1' 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-There are 0 connections to the database 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='fast' 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Master host=mdw 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Detected 0 connections to database 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Using standard WAIT mode of 600 seconds 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=fast 20150227:18:47:29:021878 gpstop:mdw:gpadmin-[INFO]:-Master segment instance directory=/data/master/gpseg-1 20150227:18:47:30:021878 gpstop:mdw:gpadmin-[INFO]:-Stopping master standby host smdw mode=fast 20150227:18:47:30:021878 gpstop:mdw:gpadmin-[WARNING]:-Error occured while stopping the standby master: ExecutionError: 'non-zero rc: 1' occured. Details: 'ssh -o 'StrictHostKeyChecking no' smdw ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/pg_ctl -D /data/master/gpseg-1 -m fast -w -t 600 stop"' cmd had rc=1 completed=True halted=False stdout='' stderr='pg_ctl: PID file "/data/master/gpseg-1/postmaster.pid" does not exist Is server running? ' 20150227:18:47:30:021878 gpstop:mdw:gpadmin-[INFO]:-Successfully shutdown standby process on smdw 20150227:18:47:30:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing parallel primary segment instance shutdown, please wait... ........................................................................................................................................................................................................................................................................................................................................... ............................................................................................................................................................................... ....................................................................................................... 20150227:18:57:40:021878 gpstop:mdw:gpadmin-[INFO]:-Commencing parallel mirror segment instance shutdown, please wait... .. 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:------------------------------------------------- 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-Failed Segment Stop Information 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:------------------------------------------------- 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:14 FAILED host:'sdw4' datadir:'/data/primary/gpseg12' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:15 FAILED host:'sdw4' datadir:'/data/primary/gpseg13' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:16 FAILED host:'sdw4' datadir:'/data/primary/gpseg14' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:17 FAILED host:'sdw4' datadir:'/data/primary/gpseg15' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:46 FAILED host:'sdw6' datadir:'/data/mirror/gpseg16' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:22 FAILED host:'sdw6' datadir:'/data/primary/gpseg20' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:23 FAILED host:'sdw6' datadir:'/data/primary/gpseg21' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:24 FAILED host:'sdw6' datadir:'/data/primary/gpseg22' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:25 FAILED host:'sdw6' datadir:'/data/primary/gpseg23' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:47 FAILED host:'sdw7' datadir:'/data/mirror/gpseg17' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:26 FAILED host:'sdw7' datadir:'/data/primary/gpseg24' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:27 FAILED host:'sdw7' datadir:'/data/primary/gpseg25' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:28 FAILED host:'sdw7' datadir:'/data/primary/gpseg26' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:29 FAILED host:'sdw7' datadir:'/data/primary/gpseg27' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:10 FAILED host:'sdw3' datadir:'/data/primary/gpseg8' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:11 FAILED host:'sdw3' datadir:'/data/primary/gpseg9' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:12 FAILED host:'sdw3' datadir:'/data/primary/gpseg10' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:13 FAILED host:'sdw3' datadir:'/data/primary/gpseg11' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:2 FAILED host:'sdw1' datadir:'/data/primary/gpseg0' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:3 FAILED host:'sdw1' datadir:'/data/primary/gpseg1' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:4 FAILED host:'sdw1' datadir:'/data/primary/gpseg2' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:5 FAILED host:'sdw1' datadir:'/data/primary/gpseg3' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:48 FAILED host:'sdw1' datadir:'/data/mirror/gpseg18' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:6 FAILED host:'sdw2' datadir:'/data/primary/gpseg4' with reason:'cmd had rc=255 completed=True halted=False stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003 -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1 20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments... 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown) 20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown) 20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown) 20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:- COMMAND RESULTS STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... failed stderr: pg_ctl: server does not shut down
STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist Is server running?
' stderr=''' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:7 FAILED host:'sdw2' datadir:'/data/primary/gpseg5' with reason:'cmd had rc=255 completed=True halted=False stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003 -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1 20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments... 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown) 20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown) 20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown) 20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:- COMMAND RESULTS STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... failed stderr: pg_ctl: server does not shut down
STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist Is server running?
' stderr=''' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:8 FAILED host:'sdw2' datadir:'/data/primary/gpseg6' with reason:'cmd had rc=255 completed=True halted=False stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003 -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1 20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments... 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown) 20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown) 20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown) 20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:- COMMAND RESULTS STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... failed stderr: pg_ctl: server does not shut down
STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist Is server running?
' stderr=''' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:9 FAILED host:'sdw2' datadir:'/data/primary/gpseg7' with reason:'cmd had rc=255 completed=True halted=False stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003 -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1 20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments... 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown) 20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown) 20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown) 20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:- COMMAND RESULTS STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... failed stderr: pg_ctl: server does not shut down
STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist Is server running?
' stderr=''' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:49 FAILED host:'sdw2' datadir:'/data/mirror/gpseg19' with reason:'cmd had rc=255 completed=True halted=False stdout='20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Starting gpsegstop.py with args: -D /data/primary/gpseg4:40000 -D /data/primary/gpseg5:40001 -D /data/primary/gpseg6:40002 -D /data/primary/gpseg7:40003 -D /data/mirror/gpseg19:50003 -m fast -t 600 -V postgres (Greenplum Database) 4.3.3.1 build 1 20150227:18:47:32:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Issuing shutdown commands to local segments... 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGSCONT to 37112 20150227:18:57:33:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGTERM to 37112 (smart shutdown) 20150227:18:57:36:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGINT to 37112 (fast shutdown) 20150227:18:57:39:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:-Sending SIGQUIT to 37112 (immediate shutdown) 20150227:18:57:40:028986 gpsegstop.py_sdw2:gpadmin:sdw2:gpadmin-[INFO]:- COMMAND RESULTS STATUS--DIR:/data/primary/gpseg4--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg4/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg5--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg5/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/primary/gpseg6--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: waiting for server to shut down........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... failed stderr: pg_ctl: server does not shut down
STATUS--DIR:/data/primary/gpseg7--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/primary/gpseg7/postmaster.pid" does not exist Is server running?
STATUS--DIR:/data/mirror/gpseg19--STOPPED:False--REASON:Shutdown failed: rc: 1 stdout: stderr: pg_ctl: PID file "/data/mirror/gpseg19/postmaster.pid" does not exist Is server running?
' stderr=''' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:45 FAILED host:'sdw1' datadir:'/data/mirror/gpseg15' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:51 FAILED host:'sdw1' datadir:'/data/mirror/gpseg21' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:54 FAILED host:'sdw1' datadir:'/data/mirror/gpseg24' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:41 FAILED host:'sdw7' datadir:'/data/mirror/gpseg11' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:44 FAILED host:'sdw7' datadir:'/data/mirror/gpseg14' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:50 FAILED host:'sdw7' datadir:'/data/mirror/gpseg20' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:37 FAILED host:'sdw6' datadir:'/data/mirror/gpseg7' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:40 FAILED host:'sdw6' datadir:'/data/mirror/gpseg10' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:43 FAILED host:'sdw6' datadir:'/data/mirror/gpseg13' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:30 FAILED host:'sdw2' datadir:'/data/mirror/gpseg0' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:52 FAILED host:'sdw2' datadir:'/data/mirror/gpseg22' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:55 FAILED host:'sdw2' datadir:'/data/mirror/gpseg25' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:32 FAILED host:'sdw4' datadir:'/data/mirror/gpseg2' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:35 FAILED host:'sdw4' datadir:'/data/mirror/gpseg5' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:38 FAILED host:'sdw4' datadir:'/data/mirror/gpseg8' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:57 FAILED host:'sdw4' datadir:'/data/mirror/gpseg27' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:31 FAILED host:'sdw3' datadir:'/data/mirror/gpseg1' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:34 FAILED host:'sdw3' datadir:'/data/mirror/gpseg4' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:53 FAILED host:'sdw3' datadir:'/data/mirror/gpseg23' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:56 FAILED host:'sdw3' datadir:'/data/mirror/gpseg26' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:33 FAILED host:'sdw5' datadir:'/data/mirror/gpseg3' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:36 FAILED host:'sdw5' datadir:'/data/mirror/gpseg6' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:39 FAILED host:'sdw5' datadir:'/data/mirror/gpseg9' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:42 FAILED host:'sdw5' datadir:'/data/mirror/gpseg12' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:18 FAILED host:'sdw5' datadir:'/data/primary/gpseg16' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:19 FAILED host:'sdw5' datadir:'/data/primary/gpseg17' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:20 FAILED host:'sdw5' datadir:'/data/primary/gpseg18' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-DBID:21 FAILED host:'sdw5' datadir:'/data/primary/gpseg19' with reason:'Shutdown failed' 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:- Segments stopped successfully = 0 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Segments with errors during stop = 45 <<<<<<<< 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:- 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Segments that are currently marked down in configuration = 11 <<<<<<<< 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:- (stop was still attempted on these segments) 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[INFO]:-Successfully shutdown 0 of 56 segment instances <<<<<<<< 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:------------------------------------------------- 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Segment instance shutdown failures reported 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Failed to shutdown 45 of 56 segment instances <<<<< 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-A total of 45 errors were encountered 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-Review logfile /home/gpadmin/gpAdminLogs/gpstop_20150227.log 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:-For more details on segment shutdown failure(s) 20150227:18:57:42:021878 gpstop:mdw:gpadmin-[WARNING]:------------------------------------------------- gpadmin@mdw:~> gpadmin@mdw:~> gpadmin@mdw:~> gpstart -a 20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: -a 20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.3.1 build 1' 20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20150227:18:58:42:022377 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode 20150227:18:58:43:022377 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20150227:18:58:43:022377 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20150227:18:58:44:022377 gpstart:mdw:gpadmin-[INFO]:-Setting new master era 20150227:18:58:44:022377 gpstart:mdw:gpadmin-[INFO]:-Master Started... 20150227:18:58:44:022377 gpstart:mdw:gpadmin-[INFO]:-Shutting down master 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/mirror/gpseg0 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg3 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg6 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg9 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/mirror/gpseg12 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg16 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg17 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg18 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw5 directory /data/primary/gpseg19 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/mirror/gpseg22 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /data/mirror/gpseg25 <<<<< 20150227:18:58:46:022377 gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-Process results... 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:9 FAILED host:'sdw2' datadir:'/data/primary/gpseg7' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:7 FAILED host:'sdw2' datadir:'/data/primary/gpseg5' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:6 FAILED host:'sdw2' datadir:'/data/primary/gpseg4' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:54 FAILED host:'sdw1' datadir:'/data/mirror/gpseg24' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:51 FAILED host:'sdw1' datadir:'/data/mirror/gpseg21' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:45 FAILED host:'sdw1' datadir:'/data/mirror/gpseg15' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:4 FAILED host:'sdw1' datadir:'/data/primary/gpseg2' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:3 FAILED host:'sdw1' datadir:'/data/primary/gpseg1' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:23 FAILED host:'sdw6' datadir:'/data/primary/gpseg21' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:26 FAILED host:'sdw7' datadir:'/data/primary/gpseg24' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-DBID:17 FAILED host:'sdw4' datadir:'/data/primary/gpseg15' with reason:'Failure in segment mirroring; check segment logfile' 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------
20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:- Successful segment starts = 34 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Failed segment starts, from mirroring connection between primary and mirror = 11 <<<<<<<< 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:- Other failed segment starts = 0 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 11 <<<<<<<< 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:- 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-Successfully started 34 of 45 segment instances, skipped 11 other segments <<<<<<<< 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Segment instance startup failures reported 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Failed start 11 of 45 segment instances <<<<<<<< 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20150227.log 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-For more details on segment startup failure(s) 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-Run gpstate -s to review current segment instance status 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-There are 11 segment(s) marked down in the database 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20150227:19:08:50:022377 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance mdw directory /data/master/gpseg-1 20150227:19:08:51:022377 gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active 20150227:19:09:05:022377 gpstart:mdw:gpadmin-[INFO]:-Starting standby master 20150227:19:09:05:022377 gpstart:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw in directory: /data/master/gpseg-1 20150227:19:09:08:022377 gpstart:mdw:gpadmin-[WARNING]:-Number of segments which failed to start: 11
12、连接数据库,删除不需要的清单表
gpadmin@mdw:~> psql -d template1 psql (8.2.15) Type "help" for help.
template1=# dn List of schemas Name | Owner --------------------+--------- gp_toolkit | gpadmin information_schema | gpadmin pg_aoseg | gpadmin pg_bitmapindex | gpadmin pg_catalog | gpadmin pg_toast | gpadmin public | gpadmin (7 rows)
template1=# l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+--------------------- cems | gpadmin | UTF8 | dw | gpadmin | UTF8 | postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | =c/gpadmin : gpadmin=CTc/gpadmin template1 | gpadmin | UTF8 | =c/gpadmin : gpadmin=CTc/gpadmin (5 rows)
13、gprecoverseg恢复
gprecoverseg Continue with segment recovery procedure Yy|Nn (default=N): > Y 20150227:22:43:00:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15 segment(s) to recover 20150227:22:43:00:035973 gprecoverseg:mdw:gpadmin-[INFO]:-Ensuring 15 failed segment(s) are stopped 20150227:22:43:01:035973 gprecoverseg:mdw:gpadmin-[INFO]:-38594: /data/mirror/gpseg0 20150227:22:43:03:035973 gprecoverseg:mdw:gpadmin-[INFO]:-41988: /data/mirror/gpseg1 20150227:22:43:04:035973 gprecoverseg:mdw:gpadmin-[INFO]:-1525: /data/mirror/gpseg2 20150227:22:43:05:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15788: /data/mirror/gpseg3 20150227:22:43:06:035973 gprecoverseg:mdw:gpadmin-[INFO]:-41987: /data/mirror/gpseg4 20150227:22:43:08:035973 gprecoverseg:mdw:gpadmin-[INFO]:-1524: /data/mirror/gpseg5 20150227:22:43:09:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15787: /data/mirror/gpseg6 20150227:22:43:10:035973 gprecoverseg:mdw:gpadmin-[INFO]:-30986: /data/mirror/gpseg7 20150227:22:43:11:035973 gprecoverseg:mdw:gpadmin-[INFO]:-2730: /data/mirror/gpseg15 20150227:22:43:12:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15786: /data/primary/gpseg18 20150227:22:43:14:035973 gprecoverseg:mdw:gpadmin-[INFO]:-15785: /data/primary/gpseg19 20150227:22:43:15:035973 gprecoverseg:mdw:gpadmin-[INFO]:-2729: /data/mirror/gpseg21 20150227:22:43:16:035973 gprecoverseg:mdw:gpadmin-[INFO]:-38595: /data/mirror/gpseg22 20150227:22:43:17:035973 gprecoverseg:mdw:gpadmin-[INFO]:-2728: /data/mirror/gpseg24 20150227:22:43:19:035973 gprecoverseg:mdw:gpadmin-[INFO]:-38596: /data/mirror/gpseg25 20150227:22:43:23:035973 gprecoverseg:mdw:gpadmin-[INFO]:-Cleaning files from 15 segment(s) ........................................................................................................................................................................................................................................................................................................................................................................................................................................... 20150227:22:50:31:035973 gprecoverseg:mdw:gpadmin-[INFO]:-Building template directory 20150227:22:50:31:035973 gprecoverseg:mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'ssh -o 'StrictHostKeyChecking no' sdw1 ". /usr/local/greenplum-db/./greenplum_path.sh; /usr/bin/scp -o 'StrictHostKeyChecking no' -r /data/primary/gpseg0/postgresql.conf mdw:/data/master/gpbuildingsegment_02272015_35973/schema"' rc=1, stdout='', stderr='ssh: Could not resolve hostname mdw: Name or service not known lost connection
报错,20150227:22:50:31:035973 gprecoverseg:mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'ssh -o 'StrictHostKeyChecking no' sdw1 ". /usr/local/greenplum-db/./greenplum_path.sh; /usr/bin/scp -o 'StrictHostKeyChecking no' -r /data/primary/gpseg0/postgresql.conf mdw:/data/master/gpbuildingsegment_02272015_35973/schema"' rc=1, stdout='', stderr='ssh: Could not resolve hostname mdw: Name or service not known lost connection
经分析,应是hosts有问题了,直接ssh hosts的域名不行,直接ssh ip地址没有问题。
经检查,segment上的hosts有问题,和hadoop集群的hosts一样了。应该hadoop集群是后安装的,当时我要同事帮忙scp修改的hosts,估计把gp的集群也覆盖了。
修改hosts完后,
再执行gprecoverseg命令
gpadmin@mdw:~> gpstate -m 20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m 20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.3.1 build 1' 20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.3.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct 10 2014 14:31:50' 20150228:11:43:19:019954 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:--Type = Spread 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw2 /data/mirror/gpseg0 50000 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw3 /data/mirror/gpseg1 50001 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw4 /data/mirror/gpseg2 50002 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw5 /data/mirror/gpseg3 50003 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw3 /data/mirror/gpseg4 50000 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw4 /data/mirror/gpseg5 50001 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw5 /data/mirror/gpseg6 50002 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw6 /data/mirror/gpseg7 50003 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw4 /data/mirror/gpseg8 50000 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw5 /data/mirror/gpseg9 50001 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw6 /data/mirror/gpseg10 50002 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw7 /data/mirror/gpseg11 50003 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw5 /data/mirror/gpseg12 50000 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw6 /data/mirror/gpseg13 50001 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw7 /data/mirror/gpseg14 50002 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw1 /data/mirror/gpseg15 50003 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw6 /data/mirror/gpseg16 50000 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw7 /data/mirror/gpseg17 50001 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw1 /data/mirror/gpseg18 50002 Acting as Primary Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw2 /data/mirror/gpseg19 50003 Acting as Primary Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw7 /data/mirror/gpseg20 50000 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw1 /data/mirror/gpseg21 50001 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw2 /data/mirror/gpseg22 50002 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw3 /data/mirror/gpseg23 50003 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw1 /data/mirror/gpseg24 50000 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw2 /data/mirror/gpseg25 50001 Passive Resynchronizing 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw3 /data/mirror/gpseg26 50002 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:- sdw4 /data/mirror/gpseg27 50003 Passive Synchronized 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20150228:11:43:20:019954 gpstate:mdw:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) are acting as primaries
恢复正常