Greenplum 集群部署

最近开始接触Greenplum,线上也在使用了,感觉还不错,本次介绍一下集群的部署方法。那么Greenplum的架构如下:

                                                     (架构图来源网络)

简单来说GPDB是一个分布式数据库软件,其可以管理和处理分布在多个不同主机上的海量数据。对于GPDB来说,一个DB实例实际上是由多个独立的PostgreSQL实例组成的,它们分布在不同的物理主机上,协同工作,呈现给用户的是一个DB的效果。Master是GPDB系统的访问入口,其负责处理客户端的连接及SQL 命令、协调系统中的其他Instance(Segment)工作,Segment负责管理和处理用户数据。

 

环境准备:

操作系统:centos 6.3 64位

master 1台(架构图中的主节点),Standby 1台(架构图中的从节点),Segment 2台。共4台服务器。

     ip       主机名
10.69.30.251   mdw
10.36.25.152   smdw
10.36.25.250   sdw1
221.5.32.152   sdw2

1. 修改/etc/hosts文件,添加下面内容(4台服务器相同的配置)

10.69.30.251  mdw
10.36.25.152  smdw
10.36.25.250  sdw1
10.37.237.152  sdw2

2. 服务器关闭selinux,防火墙4台服务器相互开放,测试环境可以直接先关闭防火墙。(4台服务器)

3. 操作系统参数设置

vim /etc/sysctl.conf   (注:4台服务器)

kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.default.arp_filter = 1
net.core.netdev_max_backlog = 10000
vm.overcommit_memory = 2
kernel.msgmni = 2048
net.ipv4.ip_local_port_range = 1025 65535

vim /etc/security/limits.conf  (注:4台服务器)

* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

我的操作系统是centos 6.3,所以还有一个文件要修改:那就是/etc/security/limits.d/90-nproc.conf (注:4台服务器都要修改)

修改后的内容如下:

* soft nproc 131072
* hard nproc 131072
* soft nofile 1048576
* hard nofile 1048576

磁盘预读参数及 deadline算法修改 (注:4台服务器)

blockdev --setra 65536 /dev/sda
echo deadline > /sys/block/sda/queue/scheduler

 

软件下载地址:https://network.pivotal.io/products/pivotal-gpdb

在Master主机上安装GP二进制文件,也就是主机名是mdw的服务器。(注:在master上安装即可,后面通过批量的方法安装剩下的服务器)

我下载的文件是greenplum-db-4.3.7.3-build-2-RHEL5-x86_64.zip

文件解压之是一个可执行文件,greenplum-db-4.3.7.3-build-2-RHEL5-x86_64.bin,直接执行安装。

./greenplum-db-4.3.7.3-build-2-RHEL5-x86_64.bin 

然后有些说明,一直回车,出现如下图按照提示安装即可完成。

在master上添加用户

useradd gpadmin

确保所有的服务器安装了ed软件,否则后面初始化集群就会报错。

yum install ed -y

准备文件,all_hosts ,seg_hosts放到/home/gpadmin。(用于批量安装软件以及后续集群的初始化)

[gpadmin@mdw ~]$ cat seg_hosts 
sdw1
sdw2
[gpadmin@mdw ~]$ cat all_hosts 
mdw
smdw
sdw1
sdw2
[gpadmin@mdw ~]$ 

建立节点服务器间的信任(我服务器已经打通了信任的,没有打通的会要求输入密码)

[root@mdw gpadmin]# gpssh-exkeys -f all_hosts
[STEP 1 of 5] create local ID and authorize on local host
  ... /root/.ssh/id_rsa file exists ... key generation skipped

[STEP 2 of 5] keyscan all hosts and update known_hosts file

[STEP 3 of 5] authorize current user on remote hosts
  ... send to smdw
  ... send to sdw1
  ... send to sdw2

[STEP 4 of 5] determine common authentication file content

[STEP 5 of 5] copy authentication files to all remote hosts
  ... finished key exchange with smdw
  ... finished key exchange with sdw1
  ... finished key exchange with sdw2

[INFO] completed successfully

批量安装软件(GP)

cd /home/gpadmin/
source /usr/local/greenplum-db/greenplum_path.sh gpseginstall -f all_hosts -u gpadmin -p gpadmin

最后出现如下提示代表批量安装成功。

20160827:13:58:47:020986 gpseginstall:greenplum_master:root-[INFO]:-SUCCESS -- Requested commands completed

检查是否安装完成

su - gpadmin
source /usr/local/greenplum-db/greenplum_path.sh 
gpssh -f all_hosts -e ls -l $GPHOME

输出如下代表成功:

[ mdw] ls -l /usr/local/greenplum-db/.
[ mdw] total 276
[ mdw] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 bin
[ mdw] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 demo
[ mdw] drwxr-xr-x 5 gpadmin gpadmin   4096 Feb 25  2016 docs
[ mdw] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 etc
[ mdw] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 ext
[ mdw] -rw-r--r-- 1 gpadmin gpadmin  43025 Feb 25  2016 GPDB-LICENSE.txt
[ mdw] -rw-rw-r-- 1 gpadmin gpadmin    735 Aug 27 12:30 greenplum_path.sh
[ mdw] drwxr-xr-x 6 gpadmin gpadmin   4096 Feb 25  2016 include
[ mdw] drwxr-xr-x 9 gpadmin gpadmin   4096 Feb 25  2016 lib
[ mdw] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25  2016 LICENSE.thirdparty
[ mdw] drwxr-xr-x 2 gpadmin gpadmin   4096 Aug 27 15:16 sbin
[ mdw] drwxr-xr-x 4 gpadmin gpadmin   4096 Feb 25  2016 share
[sdw1] ls -l /usr/local/greenplum-db/.
[sdw1] total 276
[sdw1] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 bin
[sdw1] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 demo
[sdw1] drwxr-xr-x 5 gpadmin gpadmin   4096 Feb 25  2016 docs
[sdw1] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 etc
[sdw1] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 ext
[sdw1] -rw-r--r-- 1 gpadmin gpadmin  43025 Feb 25  2016 GPDB-LICENSE.txt
[sdw1] -rw-rw-r-- 1 gpadmin gpadmin    735 Aug 27 12:30 greenplum_path.sh
[sdw1] drwxr-xr-x 6 gpadmin gpadmin   4096 Feb 25  2016 include
[sdw1] drwxr-xr-x 9 gpadmin gpadmin   4096 Feb 25  2016 lib
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25  2016 LICENSE.thirdparty
[sdw1] drwxr-xr-x 2 gpadmin gpadmin   4096 Aug 27 15:16 sbin
[sdw1] drwxr-xr-x 4 gpadmin gpadmin   4096 Feb 25  2016 share
[smdw] ls -l /usr/local/greenplum-db/.
[smdw] total 276
[smdw] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 bin
[smdw] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 demo
[smdw] drwxr-xr-x 5 gpadmin gpadmin   4096 Feb 25  2016 docs
[smdw] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 etc
[smdw] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 ext
[smdw] -rw-r--r-- 1 gpadmin gpadmin  43025 Feb 25  2016 GPDB-LICENSE.txt
[smdw] -rw-rw-r-- 1 gpadmin gpadmin    735 Aug 27 12:30 greenplum_path.sh
[smdw] drwxr-xr-x 6 gpadmin gpadmin   4096 Feb 25  2016 include
[smdw] drwxr-xr-x 9 gpadmin gpadmin   4096 Feb 25  2016 lib
[smdw] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25  2016 LICENSE.thirdparty
[smdw] drwxr-xr-x 2 gpadmin gpadmin   4096 Aug 27 15:16 sbin
[smdw] drwxr-xr-x 4 gpadmin gpadmin   4096 Feb 25  2016 share
[sdw2] ls -l /usr/local/greenplum-db/.
[sdw2] total 276
[sdw2] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 bin
[sdw2] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 demo
[sdw2] drwxr-xr-x 5 gpadmin gpadmin   4096 Feb 25  2016 docs
[sdw2] drwxr-xr-x 2 gpadmin gpadmin   4096 Feb 25  2016 etc
[sdw2] drwxr-xr-x 3 gpadmin gpadmin   4096 Feb 25  2016 ext
[sdw2] -rw-r--r-- 1 gpadmin gpadmin  43025 Feb 25  2016 GPDB-LICENSE.txt
[sdw2] -rw-rw-r-- 1 gpadmin gpadmin    735 Aug 27 12:30 greenplum_path.sh
[sdw2] drwxr-xr-x 6 gpadmin gpadmin   4096 Feb 25  2016 include
[sdw2] drwxr-xr-x 9 gpadmin gpadmin   4096 Feb 25  2016 lib
[sdw2] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25  2016 LICENSE.thirdparty
[sdw2] drwxr-xr-x 2 gpadmin gpadmin   4096 Aug 27 15:16 sbin
[sdw2] drwxr-xr-x 4 gpadmin gpadmin   4096 Feb 25  2016 share
View Code

创建存储目录

master,standby 操作命令:

mkdir -p /data/master
chown -R gpadmin.gpadmin /data/master/

 

在master上用下面命令批量操作segment,建立目录,改权限

gpssh -f seg_hosts -e 'mkdir -p /data/primary'
gpssh -f seg_hosts -e 'mkdir -p /data/mirror'
gpssh -f seg_hosts -e 'chown -R gpadmin.gpadmin /data/primary'
gpssh -f seg_hosts -e 'chown -R gpadmin.gpadmin /data/mirror'

设置NTP同步

在每个Segment主机,以root登录编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个server参数指向Standby主机。如下面:

server mdw prefer
server smdw

在Standby主机,以root登录编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个参数指向数据中心的时间服务器。

server mdw prefer

在Master主机,使用NTP守护进程同步所有Segment主机的系统时钟。例如,使用gpssh来完成:

gpssh -f all_hosts -v -e 'ntpd'

输出如下代表成功:

[root@mdw gpadmin]# gpssh -f all_hosts -v -e 'ntpd'
[WARN] Reference default values as $MASTER_DATA_DIRECTORY/gpssh.conf could not be found
Using delaybeforesend 0.05 and prompt_validation_timeout 1.0

[Reset ...]
[INFO] login mdw
[INFO] login smdw
[INFO] login sdw1
[INFO] login sdw2
[ mdw] ntpd
[smdw] ntpd
[sdw1] ntpd
[sdw2] ntpd
[INFO] completed successfully

[Cleanup...]

要配置集群自动同步系统时钟,应开启ntpd服务,并设置为开机时自动运行,4台服务器都启动:

 /etc/init.d/ntpd start 

检查环境(我这里检查一下操作系统参数设置是否ok,网络,io,都可以检查的)

[root@mdw gpadmin]# gpcheck -f all_hosts -m mdw
20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-dedupe hostnames
20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-Detected platform: Generic Linux Cluster
20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-generate data on servers
20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-copy data files from servers
20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-delete remote tmp files
20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-Using gpcheck config file: /usr/local/greenplum-db/./etc/gpcheck.cnf
20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-GPCHECK_NORMAL
20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-gpcheck completing...

在master上进行下面的操作:

创建Greenplum数据库配置文件,以gpadmin用户登录

su - gpadmin
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config   /home/gpadmin/gpinitsystem_config
chmod 775 gpinitsystem_config

设置所有必须的参数

ARRAY_NAME="EMC Greenplum DW"
SEG_PREFIX=gpseg
PORT_BASE=40000
declare -a DATA_DIRECTORY=(/data/primary)
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE

设置可选参数

MIRROR_PORT_BASE=50000
REPLICATION_PORT_BASE=41000
declare -a MIRROR_DATA_DIRECTORY=(/data/mirror)

运行初始化工具初始化数据库

su - gpadmin
source /usr/local/greenplum-db/greenplum_path.sh 
gpinitsystem -c gpinitsystem_config -h seg_hosts

正常情况下初始化完成以后最后的日志没有任何报错,而且gp进程也是启动了的。

初始化日志:

20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------

启动进程:

[root@mdw gpadmin]# ps -ef | grep postgres
gpadmin  18082     1  0 16:23 ?        00:00:00 /usr/local/greenplum-db-4.3.7.3/bin/postgres -D /data/master/gpseg-1 -p 5432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
gpadmin  18083 18082  0 16:23 ?        00:00:00 postgres: port  5432, master logger process                                                                                         
gpadmin  18086 18082  0 16:23 ?        00:00:00 postgres: port  5432, stats collector process                                                                                       
gpadmin  18087 18082  0 16:23 ?        00:00:00 postgres: port  5432, writer process                                                                                                
gpadmin  18088 18082  0 16:23 ?        00:00:00 postgres: port  5432, checkpoint process                                                                                            
gpadmin  18089 18082  0 16:23 ?        00:00:00 postgres: port  5432, seqserver process                                                                                             
gpadmin  18090 18082  0 16:23 ?        00:00:00 postgres: port  5432, ftsprobe process                                                                                              
gpadmin  18091 18082  0 16:23 ?        00:00:00 postgres: port  5432, sweeper process                                                                                               
root     22716 20223  0 16:52 pts/0    00:00:00 grep postgres
[root@mdw gpadmin]# 


现在只有1个master,2个segment,没有standby,那么接下来把standby加入集群。

在master服务器上执行

gpinitstandby -s smdw

输出如下:

[gpadmin@mdw ~]$ gpinitstandby -s smdw
20160827:16:59:24:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking for filespace directory /data/master/gpseg-1 on smdw
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname               = mdw
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory         = /data/master/gpseg-1
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port                   = 5432
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname       = smdw
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /data/master/gpseg-1
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog         = On
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:- Filespace locations
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_system -> /data/master/gpseg-1
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> y
20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent.
20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog...
20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully.
20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file...
20160827:16:59:37:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating filespace flat files...
20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Filespace flat file updated successfully.
20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master
20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw  in directory: /data/master/gpseg-1
20160827:16:59:40:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on smdw
View Code

 

设置环境变量,master,standby都设置。

1.  以gpadmin用户登录:

su - gpadmin

 

2.  打开用户启动文件(如.bashrc)。比如:

vim ~/.bashrc

 

3.  添加新行用以加载greenplum_path.sh文件和设置MASTER_DATA_DIRECTORY环境变量。例如:

source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1

 

启动和停止数据库测试是否能正常启动和关闭,命令如下

gpstart
gpstop 

 

到此 greenplum 就部署完成了。下面进行一些简单的测试。

登录数据库:psql -d postgres

建表,插入,查询

postgres=# create table student ( no int primary key,student_name varchar(40),age int);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "student_pkey" for table "student"
CREATE TABLE
postgres=# insert into student values(1,'yayun',18);
INSERT 0 1
postgres=# select * from student;
 no | student_name | age 
----+--------------+-----
  1 | yayun        |  18
(1 row)

 

posted @ 2016-08-27 17:17  yayun  阅读(7658)  评论(0编辑  收藏  举报