postgresql+pgpool安装文档

postgressql集群安装

[TOC]

1 数据库编译安装

前言

使用pgpool搭建postgresql数据库高可用集群

机器：2~3台，和虚拟ip一台系统：centos6.x~7.x

防火墙：关闭状态

注意事项：本文使用的是gcc编译器版本为4.8.5，理论上，4.8.5到5.x版本都能正常使用，8.2版本就会文件编译失败

1.1 安装Linux环境依赖

以下依赖包是postgresql数据库安装的基础。

yum install gcc-c++
yum install flex
yum install readline-devel
yum install zlib-devel
yum install postgresql-devel

1.2 postgresql安装

1.2.1 上传postgresql源码

su - root              root用户登录

mkdir /opt/soft_bak    新建目录

cd /opt/soft_bak      切换目录

1.2.2 解压缩postgresql源码

wget http://ftp.postgresql.org/pub/source/v9.5.1/postgresql-9.5.1.tar.bz2

tar -jxvf postgresql-9.5.1.tar.bz2

cd postgresql-9.5.1

1.2.3 安装配置postgresql

./configure --prefix=/opt/pgsql951 设定postgresql数据库管理系统安装位置

gmake world   编译源码

gmake install-world 安装源码 成功查看/opt/pgsql951是否生成源码目录

1.2.4 添加用户

useradd pg95 添加安装启动管理数据库的普通用户

passwd pg95 设置密码bicon@123

1.2.5 配置环境变量

vi /home/pg95/.bash_profile

设置环境变量,添加内容如下:

export PGDATA=/home/pg95/pg_root
export LANG=en_US.utf8
export PGHOME=/opt/pgsql951
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGUSER=postgres
#export PGHOST=$PGDATA 
alias rm='rm -i'
alias ll='ls -lh'
export PGDATABASE=postgresql

source /home/pg95/.bash_profile ##非常重要

mkdir -p /home/pg95/pg_root
mkdir -p /home/postgres/pgdata/
chown -R pg95 /home/postgres/pgdata/  ##非常重要
chown -R pg95 /home/pg95/pg_root

chmod 755 /home/      设置目录权限,这里不允许设置777,否则Linux互信不好使

chmod 755 /home/pg95/ 设置目录权限,这里不允许设置777,否则Linux互信不好使

1.2.7 检测数据库管理系统

su - pg95

which psql 检查psql数据库管理系统是否安装正常

psql -V检查psql数据库管理系统是否安装正常

echo $PGDATA检查psql数据库管理系统是否安装正常

echo $PGHOME检查psql数据库管理系统是否安装正常

1.2.8 初始化数据库

initdb -D $PGDATA -E UTF8 --locale=C -Upostgres -W 初始化数据库

成功之后会让你输入超管密码

1.3 linux系统参数调整

1.3.1 修改系统变量

su - root
vi /etc/sysctl.conf

kernel.shmmni = 4096
kernel.sem = 50100 64128000 50100 1280
fs.file-max = 7672460
net.ipv4.ip_local_port_range = 9000 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576

1.3.2 设置区域大小

vi /etc/security/limits.conf

soft nofile 131072
hard nofile 131072
soft nproc 131072
hard nproc 131072
soft core unlimited
hard core unlimited
soft memlock 50000000
hard memlock 50000000

1.3.3 关闭防火墙

1.4 postgresql配置

su - pg95
cd $PGDATA
vi pg_hba.conf  添加内容如下:
host    all all  0.0.0.0/0  md5

vi postgresql.conf
#添加如下内容
listen_addresses = '*'
port = 5432
superuser_reserved_connections = 20
unix_socket_directories = '/home/postgres/pgdata'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
vacuum_cost_delay = 10
bgwriter_delay = 10ms
synchronous_commit = off
#checkpoint_segments = 8     9.5版本回报错
wal_writer_delay = 10ms
log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename ='postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 10MB

1.5 测试数据库是否安装成功

注意

postgresql 和pgpool都适应以下规则

-h是host -p 是端口

-d 指的是数据库 -U 指的是用户

su - pg95

pg_ctl start 启动数据库

pg_ctl -m fast stop 停止数据库

pg_ctl restart 重启数据库

Linux下执行:

ps -ef | grep postgres         数据库服务是否正常启动

psql -h 127.0.0.1 -d postgres -U postgres  登录数据库

以上步骤请在两个Linux系统中安装两套数据库系统,用来配置双机热备节点。

1.6 新创建postgresql用户

        1、create role odoo superuser;
        2、alter role odoo password 'bicon@123';
        3、alter role odoo login;

命令格式：
切换数据库：\c [database_name]
切换用户：\c - [user_name]

启动中遇到的深坑

postgres@pgdb-> psql
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.1921"?
--------------------- 
解决办法 https://blog.csdn.net/pg_hgdb/article/details/78657789

2 postgressql流复制配置基于pgpool-ii的集群配置

集群 ip

pgpool 10.1.19.8 主

pgpool 10.1.19.7 备

pgsql1 10.1.19.8 主

pgsql2 10.1.19.10 备

10.1.19.250 vip

1.1 主节点配置准备工作

cd $PGDATA
vi pg_hba.conf
#添加如下内容
host   replication     replica        0.0.0.0/0               md5

vi postgresql.conf
#添加如下内容
listen_addresses = '*'
port = 5432
max_connections = 500
#然后重启一下数据库就可以正常使用了，至此前期准备工作已经完成
pg_ctl start 启动数据库

1.2 主库配置：

1.2.1 修改/home/pg95/pg_root/postgresql.conf，在文件末尾增加以下属性：

vi postgresql.conf

接着在文件末尾添加如下

wal_level = hot_standby
checkpoint_segments = 16
checkpoint_timeout = 5min
archive_mode = on
max_wal_senders = 3
wal_keep_segments = 16

wal_level = hot_standby：这个是设置主为wal的主机

max_wal_senders = 3：这个设置了可以最多有几个流复制连接，差不多有几个从，就设置几个

wal_keep_segments = 16：设置流复制保留的最多的xlog数目

然后重启报错，将刚才的文件postgresql.conf中的checkpoint_segments这行注释掉

[pg95@peer0 pg_root]$ pg_ctl restart
waiting for server to shut down.... done
server stopped
server starting
[pg95@peer0 pg_root]$ LOG:  unrecognized configuration parameter "checkpoint_segments" in file "/home/pg95/pg_root/postgresql.conf" line 659
FATAL:  configuration file "/home/pg95/pg_root/postgresql.conf" contains errors

vi postgresql.conf

在checkpoint_segments那行前面加注释，保存再重启。

1.2.1 创建具有replication的权限用户

su - pg95
psql -h 127.0.0.1 -d postgres -U postgres  登录数据库
create role replica login replication encrypted password 'replica'; 创建用户

1.2.3 配置密码文件

cd /home/pg95
vi .pgpass
#主数据库IP:主数据库端口号:replication:replica:replica
10.1.19.8:5432:replica:replica
chmod 400 .pgpass
#重启数据库
pg_ctl restart

2.1 备库配置

2.1.1 停止备库的PostgreSQL服务：

pg_ctl stop

2.1.2 清空备库之前的数据文件：

rm -rf /home/pg95/pg_root

2.1.3 从主库上恢复数据，输入下面命令，密码是之前配置的密码：bicon@123

pg_basebackup -D $PGDATA -F p -X stream -v -P -h 10.1.19.8 -U odoo

2.1.4 修改postgresql.conf

#添加
cd $PGDATA
hot_standby = on

2.1.5 增加recovery.conf文件：

vi recovery.conf

standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'host=10.1.19.8 port=5432 user=odoo password=bicon@123'

赋予权限

chmod 777 recovery.conf

2.1.6 启动数据库

3.验证配置：

在主库的服务器上输入命令，会看到多出一个wal sender process的进程：＃ps -aux | grep postgres

在备库的服务器上输入命令，会看到多出一个wal receiver process的进程：＃ps -aux | grep postgres

测试：

在主机上创建表user，然后在备机上可以看到user表已经被复制过来了

在主机上插入数据，然后在备机上可以看到这条数据已经被复制过来了

在备机上插入数据失败，说明在主备模式下备机是只读的

停止备机，在主机上插入数据，然后再启用备机，可以看到备机也是有这条数据的

4.其他汇总

Postgresql支持在备库做只读查询，这样的备库叫做hot standby

在主机上检查异步流的复制情况

select pid,state,client_addr,sync_priority,sync_state from pg_stat_replication;
 pid  |   state   | client_addr | sync_priority | sync_state 
------+-----------+-------------+---------------+------------
 1414 | streaming | 10.1.19.10  |             0 | async
(1 row)


查看备库落后主库多少字节的wal日志
select pg_xlog_location_diff(pg_current_xlog_location(),replay_location)  from pg_stat_replication;

 pg_xlog_location_diff 
-----------------------
                     0
(1 row)

查看备库的状态 如何判断数据库处于备库的状态？如果数据库处于hot standby 状态，可以连接到数据库中执行pg_is_in_recovery() 函数。如果是在主库上，此函数返回的是  false  如果是在备库上，返回的是true

select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)

3 pgpool-II使用-安装与配置

注意：pgpool主节点应装在主数据库机器上

3.1 下载源码

cd /opt
wget http://www.pgpool.net/download.php?f=pgpool-II-3.3.12.tar.gz

3.2 配置ssh秘钥

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

scp authorized_keys pg95@10.1.19.10:/home/pg95/.ssh
scp authorized_keys pg95@10.1.19.8:/home/pg95/.ssh

验证下ssh配置是否成功

ssh pg95@10.1.19.10
ssh pg95@10.1.19.8
#如果无需登陆密码,则设置成功

3.3 安装

3.3.1 编译源码

 tar xvf pgpool-II-3.3.12.gz
 mkdir /opt/pgpool
 chown  -R pg95 /opt/*
 cd pgpool-II-3.3.12
 ./configure -prefix=/opt/pgpool -with-pgsql=path -with-pgsql=/opt/pgsql951
 make
 make install

切换到pg95用户下，如果编译等操作报权限不足的错误，切换到root用户使用 chown -R pg95 /opt/* 类似命令先赋予权限

3.3.2 安装pgpool_regclass 和pgpool-recovery

注意：3.3.2和3.3.3只需要在主pgpool执行，备pgpool在编译完成修改配置文件能启动就行

cd  pgpool-II-3.3.12/sql/pgpool_regclass
make && make install
psql -h /home/postgres/pgdata -f pgpool-regclass.sql template1
cd  pgpool-II-3.3.12/sql/pgpool-recovery
make && make install

psql -h /home/postgres/pgdata -f pgpool-recovery.sql template1

3.3.3 建立inert_lock表

cd /pgpool-II-3.3.12/sql/

psql -h /home/postgres/pgdata -f insert_lock.sql template1

3.4 配置pool_hba.conf

pool_hba.conf是对登录用户进行验证的，要和pg的pg_hba.conf保持一致，要么都是trust，要么都是md5验证方式，这里采用了md5验证方式如下设置：

# 重要
cd /opt/pgpool/etc
cp pool_hba.conf.sample pool_hba.conf
vim pool_hba.conf
#编辑内容如下
# "local" is for Unix domain socket connections only
local   all         all                            md5
# IPv4 local connections:
host    all         all         0.0.0.0/0          md5
host    all         all         0/0                md5

3.5 核心文件配置

#查看配置文件
cd /opt/pgpool/etc/
ll
-rw-r--r--. 1 postgres postgres 858 Dec 19 13:28 pcp.conf.sample
-rw-r--r--. 1 postgres postgres 30979 Dec 19 13:28 pgpool.conf.sample
-rw-r--r--. 1 postgres postgres 30669 Dec 19 13:28 pgpool.conf.sample-master-slave
-rw-r--r--. 1 postgres postgres 30651 Dec 19 13:28 pgpool.conf.sample-replication
-rw-r--r--. 1 postgres postgres 30690 Dec 19 13:28 pgpool.conf.sample-stream
-rw-r--r--. 1 postgres postgres 3200 Dec 19 13:28 pool_hba.conf.sample
-rw-rw-r--. 1 postgres postgres 43 Dec 19 13:53 pool_passwd

#复制部分配置文件模板
cp pgpool.conf.sample pgpool.conf
cp pcp.conf.sample pcp.conf
cp pool_hba.conf.sample pool_hba.conf
pg_pool默认只接受端口的本地连接。如果想从其他主机接受连接，请设置listen_address='*' ,相对应在pgpool.conf中配置修改如下：

修改pgpool文件（如果只是单个pgpool请参照样式一修改）

pg_pool默认只接受端口的本地连接。如果想从其他主机接受连接，请设置listen_address='*' ,相对应在pgpool.conf中配置修改如下：

样式一

#下方有所有配置
listen_address='*‘
port =9999


#Backend Connection Settings - 这一部分用来配置后台连接的数据库

backend_hostname0 = '10.1.19.8'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/pgdata'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '10.1.19.10'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/pgdata'
backend_flag1 = 'ALLOW_TO_FAILOVER'

#主备节点切换脚本配置
failover_command = '/opt/pgsql951/bin/failover_stream.sh %d %H /tmp/trigger_file'


#一般使用hba的方式进行登录认证。所以要在pgpool.conf中打开如下选项：
enable_pool_hba=on


# 以下两个参数用于调试, 能看到sql balance的情况.

log_statement = on

log_per_node_statement = on

#

sr_check_period = 5
sr_check_user = 'odoo'
sr_check_password = 'bicon@123'
----------------------------------------------------------------------------------------

#如果开启复制和负载均衡的示例 在pgpool.conf加入
replication_mode = off
load_balance_mode = on
--------------------------------------------------------------


#再修改流复制的主备模式下开启负载均衡

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

下边是双pgpool的配置文件,参考样式二

在配置之前需要明白下列东西

1. 虚拟ip(vip),高可用pgpool集群需要用vip进行漂移，准备好虚拟Ip

2. 查看本机网卡，配置后面的delegate_IP需要

[root@localhost ~]# ifconfig
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.19.8  netmask 255.255.255.0  broadcast 10.1.19.255
        inet6 fe80::250:56ff:fead:5cb7  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:ad:5c:b7  txqueuelen 1000  (Ethernet)
        RX packets 24958474  bytes 8954913478 (8.3 GiB)
        RX errors 0  dropped 18  overruns 0  frame 0
        TX packets 21012999  bytes 12801075219 (11.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.19.250  netmask 255.255.255.0  broadcast 10.1.19.255
        ether 00:50:56:ad:5c:b7  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 7381263  bytes 2281772792 (2.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7381263  bytes 2281772792 (2.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

分别对用pgppool.config，虚拟Ip和ens160

# - Virtual IP control Setting -

delegate_IP = '10.1.19.250'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig ens160:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
if_down_cmd = 'ifconfig ens160:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # ens160根据现场机器改掉

以下是主pgpool所有配置(样式二)

# CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = '10.1.19.8'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = '10.1.19.10'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/run/pgpool.pid'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'odoo'
sr_check_password = 'bicon@123'
sr_check_database = 'as'

#------------------------------------------------------------------------------
# HEALTH CHECK 健康检查
#------------------------------------------------------------------------------

health_check_period = 10 # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'odoo'
                                   # Health check user
health_check_password = 'bicon@123' #数据库密码
                                   # Password for health check user
health_check_database = 'as'
#必须设置，否则primary数据库down了，pgpool不知道，不能及时切换。从库流复制还在连接数据，报连接失败。
#只有下次使用pgpool登录时，发现连接不上，然后报错，这时候，才知道挂了，pgpool进行切换。


#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgsql951/bin/failover_stream.sh %H '

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = '10.1.19.8'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
# - Virtual IP control Setting -

delegate_IP = '10.1.19.250'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig ens160:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
if_down_cmd = 'ifconfig ens160:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = '10.1.19.10'
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'ens160'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = '10.1.19.10' #对端
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)

以下是备pgpool所有配置（样式二）

listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = '10.1.19.8'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = '10.1.19.10'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/pgpool.pid'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'odoo'
sr_check_password = 'bicon@123'
sr_check_database = 'as'

#------------------------------------------------------------------------------
# HEALTH CHECK 健康检查
#------------------------------------------------------------------------------

health_check_period = 10 # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'odoo'
                                   # Health check user
health_check_password = 'bicon@123' #数据库密码
                                   # Password for health check user
health_check_database = 'as'
#必须设置，否则primary数据库down了，pgpool不知道，不能及时切换。从库流复制还在连接数据，报连接失败。
#只有下次使用pgpool登录时，发现连接不上，然后报错，这时候，才知道挂了，pgpool进行切换。


#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgsql951/bin/failover_stream.sh %H '

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = '10.1.19.10'  #本端
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
# - Virtual IP control Setting -

delegate_IP = '10.1.19.250'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig ens160:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
if_down_cmd = 'ifconfig ens160:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = '10.1.19.8' #对端
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'ens160'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)
                                    # ens160根据现场机器改掉
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = '10.1.19.8' #对端
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)

配置文件里，故障处理配置的是failover_command = 'opt/pgsql951/bin/failover_stream.sh %H '，因此，需要在/opt/pgsql951/bin/目录中写个failover_stream.sh脚本：

vim /opt/pgsql951/bin/failover_stream.sh

#! /bin/sh 
# Failover command for streaming replication. 
# Arguments: $1: new master hostname. 

new_master=$1 
trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 

# Prompte standby database. 
/usr/bin/ssh -T $new_master $trigger_command 

exit 0;

3.6 生成md5加密（非常重要，后边恢复节点到集群如果验证错误主要这边问题）

cd /opt/pgpool/bin 
#执行pg_md5 密码
[pg95@peer0 bin]$ pg_md5 bicon@123
0a8f125a3f41f36c0507203a63cde9ad
[pg95@peer0 bin]$ cd ../etc/
[pg95@peer0 etc]$ vim pcp.conf
#添加一行并保存 odoo:刚刚生成密码
odoo:0a8f125a3f41f36c0507203a63cde9ad
[pg95@peer0 etc]$ cd /opt/pgpool/bin/
#在pgpool中添加pg数据库的用户名和密码
pg_md5 -p -m -u odoo pool_passwd
然后输入密码

3.7 pgpool启动

如果想让pgpool在前台运行，可以加“-n”参数
pgpool -n
如果想让日志打印到一个文件。使用一下命令
pgpool -n> /tmp/pgpool.log 2>&1 &
pgpool -n -D > /var/log/pgpool/pgpool.log 2>&1 &
如果想打印调度信息，加上参数“-d”
pg     > /tmp/pgpool.log 2>&1 &
停止pgpool
pgpool stop
也可以加上参数
pgpool -m fast stop


启动pgpool
pgpool -f /opt/pgpool/etc/pgpool.conf

访问进入命令行
psql -h 127.0.0.1 -p 9999 -U odoo as
或者
psql -h 10.1.19.8 -p 9999 -U odoo as

3.8 登录集群中间件查看各节点状态

 show pool_nodes;
 node_id |  hostname  | port | status | lb_weight |  role   
---------+------------+------+--------+-----------+---------
 0       | 10.1.19.8  | 5432 | 2      | 0.500000  | primary
 1       | 10.1.19.10 | 5432 | 2      | 0.500000  | standby


 status列解释

0 - 该状态仅仅用于初始化，PCP从不显示它。

1 - 节点已启动，还没有连接。

2 - 节点已启动，连接被缓冲。

3 - 节点已关闭。

3.9 测试负载均衡

开启两个pgpool连接

[pg95@localhost bin]$ psql -h 10.1.19.8 -p 9999  -U odoo as
Password for user odoo: 
psql (9.5.1)
Type "help" for help.

as=# select now();
              now              
-------------------------------
 2018-05-17 10:16:36.224685+08
(1 row)

as=# select now();
              now              
-------------------------------
 2018-05-17 10:16:37.863407+08
(1 row)
--------------------------------------------------------------------------------------

[pg95@localhost opt]$ psql -h 10.1.19.8 -p 9999  -U odoo as
Password for user odoo: 
psql (9.5.1)
Type "help" for help.

as=# select now();
              now              
-------------------------------
 2019-03-26 17:17:51.208098+08
(1 row)

as=# select now();
              now              
-------------------------------
 2019-03-26 17:17:53.261424+08
(1 row)


看来负载均衡的是客户端连接，而不是sql命令

4 PGPool集群管理

1.目前架构

pgpool 10.1.19.8 主pgpool

pgpool 10.1.19.7 备pgool

pgsql1 10.1.19.8 主数据库

pgsql2 10.1.19.10 备数据库

10.1.19.250 vip

2. 启动集群

分别启动主节点备节点的 pg数据库

pg_ctl start

2.分别启动pgpool命令

 pgpool -n -d > /var/log/pgpool/pgpool.log 2>&1 &

3.注意快速终止pgpool命令：

pgpool -m fast stop

4.启动pgpool后，通过vip查看集群节点状态,也可以用原先ip:

[pg95@localhost pg_root]$  psql -h 10.1.19.250  -p 9999 -U odoo as
Password for user odoo: 
psql (9.5.1)
Type "help" for help.

as=#  show pool_nodes;
 node_id |  hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.1.19.8  | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | 10.1.19.10 | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

5 .Pgpool的HA

1. 模拟master端pgpool宕机

 pgpool -m fast stop

2. 稍等片刻，访问集群

psql -h 10.1.19.250  -p 9999 -U odoo as        #as代表数据库

show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | standby | 0             |  true  | 0
(2 rows)
#访问成功，在master节点上的pgpool宕机后，由slave节点的pgpool接管vip和集群服务，并未中断应用访问。
#在master上重新启动pgpool后，定制slave上的pgpool服务，结果一样。

3. 模拟主节点数据库宕机

pg_ctl stop

#能看到日志清楚看到主机down机了，slave切换了。
#稍等片刻后，访问集群
psql -h 10.1.19.250  -p 9999 -U odoo as

show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.1.19.8   | 5432 | down   | 0.500000  | standby | 0          | false             | 0
 1       | 10.1.19.10   | 5432 | up     | 0.500000  | primary | 0          | true              | 0

 #此时备数据库已经成为 主数据库了

4. 修复主数据库

 pg_ctl start

这个时候去show pool_nodes还是之前状态，集群已经将原先主数据踢掉了，需要重新加入

之前停掉的的数据库node是0 ，所以指定为0

注意：不同版本的pgpool执行命令不一样，下边我写两种如果第一种不行用第二种

pcp_attach_node -d 5 10.1.19.250 9898 odoo bicon@123 0

pcp_attach_node -d 5 -h 10.1.19.250 -p 9898 -U odoo -n 0

输入密码登陆后查看节点状态就恢复了

如果状态还是没恢复或者状态码不对，类似下边，有一个是1

as=# show pool_nodes;
 node_id |  hostname  | port | status | lb_weight |  role   
---------+------------+------+--------+-----------+---------
 0       | 10.1.19.8  | 5432 | 2      | 0.500000  | primary
 1       | 10.1.19.10 | 5432 | 1      | 0.500000  | standby
(2 rows)

#退出pgpool在进入就好了
as=# \q
[pg95@localhost oracle_fdw-1.5.0]$  psql -h 10.1.19.250 -p 9999  -U odoo as
Password for user odoo: 
psql (9.5.1)
Type "help" for help.

as=# show pool_nodes;
 node_id |  hostname  | port | status | lb_weight |  role   
---------+------------+------+--------+-----------+---------
 0       | 10.1.19.8  | 5432 | 2      | 0.500000  | primary
 1       | 10.1.19.10 | 5432 | 2      | 0.500000  | standby
(2 rows)

5. 可能出现的深坑错误

psql: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

2019-03-26 11:23:00 ERROR: pid 3494: pool_read_message_length: message length (8) in slot 1 does not match with slot 0(12)
2019-03-26 11:23:00 ERROR: pid 3494: Failed to read the authentication packet length. This is likely caused by the inconsistency of auth method among DB nodes. In this case you can check the previous error messages (hint: length field) from pool_read_message_length and recheck the pg_hba.conf settings.

解决方法

pgpool -C -D

6. 阶段性总结

到目前位置实现了

2个pgpool的高可用，其中1个宕机之后另外一个重启之后自动加入工作

主备数据库的读写分离，主库写，备库读，但是利用pgpool的负载均衡可以实现主库写，主库备库一起读

6. 主节点数据恢复

1. 实现场景

当主数据库宕机之后，备数据库升级为主数据的角色，所以在此期间数据库正常使用，鉴于之前的双机热备功能，只有备库能复制主库的数据，所以当主库重新加入集群后，数据是不完全的，需要从备库获取更新，所以采用逆向思维的方法并且使用pg_rewind 命令能实现

2. 实现方法通过pg_rewind 命令

2.1 修改postgresql文件

1 full_page_writes 2 wal_log_hints 设置成 on 或者 PG 在初始化时开启 checksums 功能

vim postgresql.conf

full_page_writes = on
wal_log_hints = on

在主库上执行重启数据库操作，再将数据库停止状态

2.2 将要恢复的主数据库停掉

在主库执行

pg_ctl stop

2.3 提升备库状态

在备库执行

pg_ctl promote

2.4 将主数据库恢复为standby（降成备库）

在主库上执行重启数据库操作，再将数据库停止状态

pg_rewind --target-pgdata $PGDATA --source-server='host=10.1.19.10 port=5432 user=odoo password=bicon@123 dbname=as' -P

执行成功结果为

servers diverged at WAL position 0/284148D8 on timeline 1
rewinding from last common checkpoint at 0/28414868 on timeline 1
Done!

//将原先主节点 $PGDATA 下 recovery.done 改成recovery.conf

recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=10.1.19.10 port=5432 user=odoo password=bicon@123'
#host=备库地址

检查postgresql.conf文件hot_standby=on有没有设置，如果不加上会报错

psql -h 10.1.19.8  -p 5432 -U odoo as
psql: FATAL:  the database system is starting up

2.5 启动主数据(10.1.19.8)

pg_ctl start

就可以看到数据库数据现在一样了

3. 恢复主数据库加入到pgpool集群

1.假如有A是主数据库，B是备数据库，A数据库停掉了，B升级为主库对吧，但是现在向B数据库添加数据，A因为停掉了没有启动，现在想重启A库加入集群恢复主库身份，用pcp_attach_node方法重新加入
2.但是A库现在虽然恢复主库身份了，但是数据是丢失的，B库才是完整的，由于双机热备的特性，A暂时无法从B库里恢复数据

场景一：现在集群中只有一个数据库存活，直接重启.10(仅存的主节点，由备数据库切换回来)数据库，会造成pgpool中没有数据库pgpool宕机,只能重启
场景二：想先将.8数据库恢复为主节点，再把数据库身份换回来，使用pcp_attach_node命令无效，对。8数据库使用 pg_ctl promote 提示错误
猜想：由于集群中仅存的10数据库现在身份还是主数据库身份，所以已经是主的身份，所以无法成功

结果。只能关闭pgpool，将主备身份换回来之后再 重启pgpool

4. 关闭pgpool

pgpool stop

5. 恢复10数据库为standby

关闭数据库并在备节点执行

pg_rewind --target-pgdata $PGDATA --source-server='host=10.1.19.8 port=5432 user=odoo password=bicon@123 dbname=as' -P

执行结果

The serversdiverged at WAL position 0/12ACCC30 on timeline 2.
No rewindrequired.

mv recovery.done recovery.conf

vim recovery.conf
将host改回主数据库10.19.1.8

pg_ctl start

5. 重新加入集群

分别启动主节点，备节点pgpool

 pgpool -n -d > /var/log/pgpool/pgpool.log 2>&1 &

6. 检查所有机器上 $PGDATA 目录

有recovery.done的是主机

有recovery.conf的是备机

都是recovery.done说明现在是双主节点，需要修正其中一个

posted on 2019-04-15 19:05 爱屋及乌88 阅读(815) 评论(0) 收藏举报

刷新页面返回顶部

2019美丽的梦