Centos 7 下 Corosync + Pacemaker + psc 实现 httpd 服务高可用

一、介绍

　　高可用，大家可能会想到比较简单的Keepalived，或者更早一点的 heartbeat，也可能会用到 Corosync+Pacemaker，那么他们之间有什么区别。

　　Heartbeat到了v3版本后，拆分为多个子项目：Heartbeat、cluster-glue、Resource Agent、Pacemaker。

　　　　Heartbeat：只负责维护集群各节点的信息以及它们之前通信。

　　　　Cluster-glue：当于一个中间层，可以将heartbeat和crm（pacemaker）联系起来，主要包含2个部分，LRM和STONITH；

　　　 Resource Agent ：用来控制服务启停，监控服务状态的脚本集合，这些脚本将被LRM调用从而实现各种资源启动、停止、监控等等。

　　　　pacemaker：原Heartbeat 拆分出来的资源管理器，用来管理整个HA的控制中心，客户端通过pacemaker来配置管理监控整个集群。它不能提供底层心跳信息传递的功能，它要想与对方节点通信需要借助底层(新拆分的heartbeat或corosync)的心跳传递服务，将信息通告给对方。

　　Pacemaker 配置文件比较不太容易修改，可以使用命令行界面的crmsh、pcs和图形化界面pygui、hawk等进行管理，看个人喜好。

　　Heartbeat 和 Corosync 的区别：

　　　　1、经过安装heartbeat 体验，Heartbeat 配置比较简单，主要修改三个文件即可： ha.cf、 haresources、 authkeys ，但是在支持多节点的时候不知道个人配置问题，还是其他，脑裂严重（查看很多博客说只支持2个节点），并且自带的服务脚本较少，很多服务监控脚本需要自己编写。

　　　　2、Heartbeat只能为所有的资源配置一个主服务，而corosync则允许为不同的资源组配置不同的主服务，corosync支持多个节点的集群，支持把资源进行分组，按照组进行资源的管理，设置主服务，自行进行启停。

　　　　3、管理资源的灵活性：在corosync中，其会自行处理配置文件的同步问题，heartbeat则无此功能

二、环境

　　1、系统：CentOS Linux release 7.4.1708 (Core) 三台

　　2、hosts （三个节点修改）：

　　　　10.6.32.20　　ceph1

　　　　10.6.32.21　　ceph2

　　　　10.6.32.22　　ceph3

　　3、时间同步 chrony

　　4、关闭防火墙 firewalld 和 Selinux。

　　5、配置节点ssh信任（controller1 节点操作）。

　　　　　　# ssh-keygen (生成默认证书即可)

　　　　　　# ssh-copy-id 127.0.0.1

　　　　　　# 将 .ssh 文件拷贝覆盖到其他节点。

　　　　　　# scp -r .ssh/ root@ceph2:/root/

　　　　　　# scp -r .ssh/ root@ceph3:/root/

三、配置安装

　　1、安装（all-node）

　　　　# yum install -y pacemaker pcs psmisc policycoreutils-python

　　2、查看服务是否已安装

　　　　# rpm -qa | grep corosync

　　　　# rpm -qa | grep pacemaker

　　　　# rpm -qa | grep pcs

　　　　# rpm -qa | grep psmisc

　　　　# rpm -qa | grep policycoreutils-python

　　　　安装完成之后，会生成一个用户：hacluster ,供集群使用。

　　3、启动 pcs 服务，并设置开机自启动（all-node）

　　　　# systemctl start pcsd.service

　　　　# systemctl enable pcsd.service

　　　　创建了一个软连接。

　　4、设置 hacluster密码

　　　　安装组件生成的hacluster用户，用来本地启动pcs进程，因此我们需要设定密码，每个节点的密码相同。

　　5、验证集群节点(其中一个节点测试)

　　　　# pcs cluster auth ceph1 ceph2 ceph3

　　6、生成corosync 配置文件（其中一个节点操作）

　　　　# pcs cluster setup --name openstack_cluster ceph{1,2,3}

　　　　建立集群：openstack_cluster，生成corosync文件 /etc/corosync/corosync.conf

　　7、启动集群并设置开机自启动（其中一个节点操作）

　　　　# pcs cluster start --all （# all 参数标识所有节点生效，不加all 本地生效，也可以指定某节点：pcs cluster start ceph2）

　　　　# pcs cluster enable --all

　　8、查看集群

　　　　1、查看集群状态

　　　　　　# pcs status

　　　　　　　　Online: ceph1 ceph2 ceph3 存活， 0 resources configured：还没有配置资源，有一个 WARNING 报错。

　　　　2、查看corosync

　　　　　　# corosync-cfgtool -s

　　　　　　# corosync-cmapctl | grep members #查看成员信息

　　　　　　# pcs status corosync #查看状态

　　　　3、查看 pacemaker

　　　　　　# ps axf |grep pacemaker

　　9、检查配置文件

　　　　# crm_verify -L -V

　　　　　　为保证集群数据安全，pacemaker默认启动STONITH 。因没有配置，先禁用

　　　　# pcs property set stonith-enabled=false # 禁用 STONITH

　　　　# crm_verify -L -V

　　10、添加虚拟IP（VIP）

　　　　# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=10.6.32.254 cidr_netmask=32 op monitor interval=30s

　　　　添加成功，目前虚拟IP在 ceph1 节点上，通过 ip add list 可查看。（虚拟ip：10.6.32.254 资源名称：VIP，检测时间：30s/次）

　　　　# ip add list

　　11、httpd 高可用

　　　　1、开启 apache status url 监控页

　　　　　　# vim /etc/httpd/conf.d/status.conf　　　　　　　　

<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from all
</Location>

　　　　2、关闭 httpd 服务，添加httpd 资源时会重新启动http服务，如果不关闭，会报错。

　　　　　　# systemctl stop httpd

　　　　3、添加 httpd 资源到集群。

　　　　　　# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s

　　　　　　创建了一个httpd 的集群资源 WebSite，主节点在 ceph2 上。检测页：http://localhost/server-status，检测时间：30s/次。但是有一个新的问题，虚拟IP在ceph1 上， httpd资源在 ceph2 上，会导致客户端无法访问。如果VIP在任何节点都不存在，那么WebSite也不能运行。

　　　　4、设置资源检测超时时间

　　　　　　# pcs resource op defaults timeout=120s

　　　　　　# pcs resource op defaults