High Availability手册(3): 配置

各种配置在命令行状态下，多用crm进行

Global Cluster Options

这个类型是全局配置，主要包含下面两个：

no-quorum-policy

quorum的意思是最低法定人数，pacemaker能够继续工作所需要的最少的active的node的个数，这个数是(num of nodes)/2 + 1

如果不能达到法定人数的时候行为如何呢？

ignore表示继续运行，如果是两个Node的cluster，只要有一个挂了，就小于最小法定数目了，所有要设为ignore

freeze表示已经运行的resource还是运行的，但是不能加入新的resource了。

stop表示所有的resource都会停止工作

stonith-enabled

stonith全称Shoot-The-Other-Node-In-The-Head，一枪毙命或者一枪爆头

当一个Node的heartbeat没有反应了，但是不代表这台机器不访问和写入数据，尤其是在DRBD的情况下，这个没有反应的Node很可能写入脏数据，所以通过电源管理系统ipmi，直接关掉机器是最好的保护数据的方法

设置

crm configure propertry no-quorum-policy=ignore

crm configure property stonith-enabled=false

查看

crm configure show

Cluster Resources

IP地址，apache，mysql这些服务都是Cluster resources

resource由resource agent管理

resource agent有多种类型，可以用下面的命令查看

# crm ra classes
lsb
ocf / heartbeat pacemaker redhat
service
stonith
upstart

LSB全称Linux Standards Base，是由操作系统提供的在/etc/init.d下面的script，包含start, stop, restart, reload, force-reload, status方法。然而这些方法和操作系统有关，依据不同的操作系统，不同的实现

root@pacemaker01:/home/openstack# crm ra list lsb
acpid                    apache2                  apparmor                 apport                   atd
console-setup            corosync                 corosync-notifyd         cron                     dbus
dns-clean                friendly-recovery        grub-common              halt                     irqbalance
killprocs                kmod                     logd                     networking               ondemand
openhpid                 pacemaker                pppd-dns                 procps                   rc
rc.local                 rcS                      reboot                   resolvconf               rsync
rsyslog                  screen-cleanup           sendsigs                 single                   ssh
sudo                     udev                     umountfs                 umountnfs.sh             umountroot
unattended-upgrades      urandom

root@pacemaker01:/home/openstack# ls /etc/init.d/
acpid          corosync           grub-common networking rc          rsync           ssh           unattended-upgrades
apache2        corosync-notifyd   halt         ondemand    rc.local    rsyslog         sudo          urandom
apparmor       cron               irqbalance   openhpid    rcS         screen-cleanup udev
apport         dbus               killprocs    pacemaker   README      sendsigs        umountfs
atd            dns-clean          kmod         pppd-dns    reboot      single          umountnfs.sh
console-setup friendly-recovery logd         procps      resolvconf skeleton        umountroot

OCF全称Open Cluster Framework

这种resource agent屏蔽了不同的操作系统，提供了标准的实现，在目录/usr/lib/ocf/resource.d/provider中，支持start, stop, status, monitor, meta-data方法。

root@pacemaker01:/home/openstack# crm ra list ocf heartbeat
AoEtarget           AudibleAlarm        CTDB                ClusterMon          Delay               Dummy
EvmsSCC             Evmsd               Filesystem          ICP                 IPaddr              IPaddr2
IPsrcaddr           IPv6addr            LVM                 LinuxSCSI           MailTo              ManageRAID
ManageVE            Pure-FTPd           Raid1               Route               SAPDatabase         SAPInstance
SendArp             ServeRAID           SphinxSearchDaemon Squid               Stateful            SysInfo
VIPArip             VirtualDomain       WAS                 WAS6                WinPopup            Xen
Xinetd              anything            apache              asterisk            conntrackd          db2
dhcpd               drbd                eDir88              ethmonitor          exportfs            fio
iSCSILogicalUnit    iSCSITarget         ids                 iscsi               jboss               ldirectord
lxc                 mysql               mysql-proxy         named               nfsserver           nginx
oracle              oralsnr             pgsql               pingd               portblock           postfix
pound               proftpd             rsyncd              rsyslog             scsi2reservation    sfex
slapd               symlink             syslog-ng           tomcat              varnish             vmware

root@pacemaker01:/home/openstack# ls /usr/lib/ocf/resource.d/heartbeat/
anything      Delay       Filesystem iSCSILogicalUnit ManageVE     pingd      rsyslog             Squid          vmware
AoEtarget     dhcpd       fio         iSCSITarget       mysql        portblock SAPDatabase         Stateful       WAS
apache        drbd        ICP         jboss             mysql-proxy postfix    SAPInstance         symlink        WAS6
asterisk      Dummy       ids         ldirectord        named        pound      scsi2reservation    SysInfo        WinPopup
AudibleAlarm eDir88      IPaddr      LinuxSCSI         nfsserver    proftpd    SendArp             syslog-ng      Xen
ClusterMon    ethmonitor IPaddr2     LVM               nginx        Pure-FTPd ServeRAID           tomcat         Xinetd
conntrackd    Evmsd       IPsrcaddr   lxc               oracle       Raid1      sfex                varnish
CTDB          EvmsSCC     IPv6addr    MailTo            oralsnr      Route      slapd               VIPArip
db2           exportfs    iscsi       ManageRAID        pgsql        rsyncd     SphinxSearchDaemon VirtualDomain

例如我们看IPaddr2 /usr/lib/ocf/resource.d/heartbeat/IPaddr2

start会调用ip_start

ip_start会调用add_interface $OCF_RESKEY_ip $NETMASK $BRDCAST $NIC $IFLABEL

add_interface会调用$IP2UTIL -f inet addr add $ipaddr/$netmask brd $broadcast dev $iface

这里$IP2UTIL就是一个环境变量

root@pacemaker01:/usr/lib/ocf/lib/heartbeat# ls
apache-conf.sh ocf-binaries ocf-rarun ocf-shellfuncs sapdb-nosha.sh
http-mon.sh ocf-directories ocf-returncodes ora-common.sh sapdb.sh

root@pacemaker01:/usr/lib/ocf/lib/heartbeat# grep -r "IP2UTIL" *
ocf-binaries:: ${IP2UTIL:=ip}

在这里定义了，不同的操作系统命令可能不同。

Resource有多种类型

最常用的的是primitives类型，也即基本类型

在配置一个primitives类型的resource的时候，可以先查看帮助

crm ra info ocf:heartbeat:IPaddr2

这里面有所有可以设置的parameters

Manages virtual IPv4 addresses (Linux specific version) (ocf:heartbeat:IPaddr2)

Parameters (* denotes required, [] the default):

ip* (string): IPv4 address
The IPv4 address to be configured in dotted quad notation, for example
"192.168.1.1".

cidr_netmask (string): CIDR netmask

broadcast (string): Broadcast address

mac (string): Cluster IP MAC address

Operations' defaults (advisory minimum):

    start         timeout=20s
    stop          timeout=20s
    status        timeout=20s interval=10s
    monitor       timeout=20s interval=10s

所有我们可以这样configure resource

crm configure primitive myIP ocf:heartbeat:IPaddr2 params ip=127.0.0.99 op monitor interval=60s

第二种resource的类型是group resource

有一些resource是绑定在一起的，这些resource要么同时运行在同一个node上，要么同时运行在另外的node上。

如下面的图，Web Server就是一个Group Resource，它包含三个子resource, IP Addr，Apache, Filesystem

Group有以下的属性

Start and Stop: resource安装被指定的顺序启动，按照相反的顺序关闭
Dependency: 所有的子resource必须同时运行在一个node上，一个运行不起来，统统运行不起来
Contents: 一个group至少有一个resource
Constraints: Constraints包括colocation，用于指定两个resource要运行在同一台机器上，如果需要一个resource和group运行在一个机器上，虽然可以指定这个resource和group中的一个子resource colocate在一起，根据group的定义，整个group必将与这个resource运行在一起，但是最好指定colocation的时候使用group的名字而非其中一个子resource的名字
stickiness: 一个group的stickiness的值是所有active的子resourse的值之和
resource monitoring: 不可以monitor整个group，而必须一一monitor每个子resource

要配置group resource，首先需要定义primitive resource

crm configure primitive Public-IP ocf:heartbeat:IPaddr2 params ip=1.2.3.4 id=p.public-ip

crm configure primitive Email lsb:exim params id=p.lsb-exim

下面生个一个group

crm configure group mygroup Public-IP Email

如果想改变group

crm configure modgroup mygroup add p.lsb-exim before p.public-ip

删除子resource

crm configure modgroup mygroup remove p.lsb-exim

Clones

clone的目的是部署多个active-active的resource，使得它在多个机器上同时运行。

有三种clone：

Ananymous Clone是最简单的一种，多个资源同时运行在多个地方，每个资源都是完全一样的，每个机器只能运行一个active的resource。

比如处于只读状态的apache，就是很好的例子，因为只读，他们可以很好的协同工作而没有冲突。

例如我们创建一个apache的resource，不添加constraint

crm configure primitive WebSite ocf:heartbeat:apache params configfile=/etc/apache2/apache2.conf statusurl="http://127.0.0.1/server-status" op monitor interval=1min

crm configure op_defaults timeout=240s

# crm_mon -1
Last updated: Sat Aug 2 13:24:46 2014
Last change: Sat Aug 2 13:24:38 2014 via cibadmin on pacemaker01
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
2 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

ClusterIP (ocf::heartbeat:IPaddr2): Started pacemaker01
WebSite (ocf::heartbeat:apache): Started pacemaker03

这个时候website运行在pacemaker3上，我们在每个节点ps aux一下

root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# ps aux | grep apache
root 32560 0.0 0.0 11744 900 pts/0 S+ 13:25 0:00 grep --color=auto apache

root@pacemaker02:/home/openstack# ps aux | grep apache
root 4504 0.0 0.0 11744 900 pts/0 S+ 13:25 0:00 grep --color=auto apache

root@pacemaker03:/home/openstack# ps aux | grep apache
root      4455 0.0 0.1 71300 2564 ?        Ss   13:24   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 4456 0.0 0.2 360464 4252 ?        Sl   13:24   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 4457 0.0 0.2 557136 4928 ?        Sl   13:24   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
root      4592 0.0 0.0 11744   900 pts/0    S+   13:25   0:00 grep --color=auto apache

下面我们创建一个apache-clone

crm configure clone apache-clone WebSite

这里我们使用了很多的默认值，clone-max默认为每个节点都启动，clone-node-max默认为每个节点最多启动一个

这样三个节点的apache都启动起来了

root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# ps aux | grep apache
root       410 0.0 0.0 11744   900 pts/0    S+   13:27   0:00 grep --color=auto apache
root     32754 0.0 0.1 71300 2560 ?        Ss   13:27   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 32755 0.0 0.4 360464 8332 ?        Sl   13:27   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 32756 0.0 0.4 491600 9008 ?        Sl   13:27   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf

root@pacemaker02:/home/openstack# ps aux | grep apache
root      4533 0.0 0.1 71300 2564 ?        Ss   13:27   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 4534 0.0 0.2 360464 4252 ?        Sl   13:27   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 4535 0.0 0.2 491600 4928 ?        Sl   13:27   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
root      4647 0.0 0.0 11744   900 pts/0    S+   13:27   0:00 grep --color=auto apache

root@pacemaker03:/home/openstack# ps aux | grep apache
root      4455 0.0 0.1 71300 2564 ?        Ss   13:24   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 4456 0.0 0.2 491600 4924 ?        Sl   13:24   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
www-data 4457 0.0 0.2 622672 4928 ?        Sl   13:24   0:00 /usr/sbin/apache2 -DSTATUS -f /etc/apache2/apache2.conf
root      4694 0.0 0.0 11744   900 pts/0    S+   13:27   0:00 grep --color=auto apache

root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# crm_mon -1
Last updated: Sat Aug 2 13:27:20 2014
Last change: Sat Aug 2 13:27:14 2014 via cibadmin on pacemaker01
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
4 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

ClusterIP (ocf::heartbeat:IPaddr2): Started pacemaker01
Clone Set: apache-clone [WebSite]
Started: [ pacemaker01 pacemaker02 pacemaker03 ]

我们试图将IPaddr2在三个node切换，就看出每个node都启动了apache

root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker01</body>
</html>
root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# crm resource move ClusterIP pacemaker02
root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker02</body>
</html>
root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# crm resource move ClusterIP pacemaker03
root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker03</body>
</html>

当然我们可以创建三个IPaddr2，和这三个apache分别做成一个group，前面加一个haproxy，就可以负载均衡了。

Anonymous Clone每个resource agent都可以，不需要什么特殊的处理，只要把resource启动起来就可以了。

第二种是Globally Unique Clones

这种clone，一个resource虽然被clone成多个，但是每个clone不一样，比如启动了三个apache，一个是财经新闻，一个是政治新闻，一个是娱乐新闻

为了支持global unique clone，必须要自己写相应的resource agent，至少lsb的不可以。

Copies of a clone are identified by appending a colon and a numerical offset, eg. apache:2. 这个数字称为clone id

需要resource agent根据clone id的不同进行不同的操作。

globally-unique='true'的resource可以做到下面的两点：

由于每个clone instance都是唯一的，不同的，因而两个clone instance可以运行在同一个机器上
resource agent可以将clone id作为一个hash函数，从而实现负载均衡

默认的resource中实现了这种clone的就是IPaddr2

IPaddr2中我们可以看到下面的代码

$IPTABLES -I INPUT -d $OCF_RESKEY_ip -i $NIC -j CLUSTERIP \
                --new \
                --clustermac $IF_MAC \
                --total-nodes $IP_INC_GLOBAL \
                --local-node $IP_INC_NO \
                --hashmode $IP_CIP_HASH

这是使用的IPtables中的CLUSTERIP target

一般来说，一个网络上，IP应该是唯一的，并且只有一个机器拥有这个IP，当进行arp寻找IP的时候，只有一个机器恢复。

为了实现Load Balancer，IPtables使得多个机器都拥有这个IP，并且有这个IP的clustermac。根据sourse ip以及source port进行hash运算，toal-nodes是总共的拥有这个IP的node的数量，local-node是这是第几个node，hashmod是指进行hash的方式，默认是sourceip-sourceport，出了这个iptables规则，还会生成/proc/net/ipt_CLUSTERIP/VIP_ADDRESS，当有arp的时候，会hash计算出哪个node应该反映，于是哪个host进行应答

我们先配置一个clone-vip

root@pacemaker01:~# crm configure clone clone-vip ClusterIP meta clone-max='2' clone-node-max='1' globally-unique='true'
root@pacemaker01:~# crm_mon -1
Last updated: Sat Aug 2 18:46:40 2014
Last change: Sat Aug 2 18:46:31 2014 via cibadmin on pacemaker01
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
3 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

WebSite        (ocf::heartbeat:apache):        Started pacemaker02
Clone Set: clone-vip [ClusterIP] (unique)
     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started pacemaker01
     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started pacemaker03

可以看出和apache不同，配置了两个不同的ClusterIP，一个是ClusterIP:0，一个是ClusterIP:1，都带clone id做后缀

我们先去pacemaker01上去看看

root@pacemaker01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:9b:d5:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.10/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.100.100/24 brd 192.168.100.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe9b:d511/64 scope link
       valid_lft forever preferred_lft forever
root@pacemaker01:~# iptables -nvL
Chain INPUT (policy ACCEPT 489 packets, 47504 bytes)
pkts bytes target     prot opt in     out     source               destination
    0     0 CLUSTERIP all -- eth0   *       0.0.0.0/0            192.168.100.100      CLUSTERIP hashmode=sourceip-sourceport clustermac=31:B3:55:6F:CE:87 total_nodes=2 local_node=1 hash_init=0

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 708 packets, 87757 bytes)
pkts bytes target prot opt in out source destination

再去pacemaker02上看看

root@pacemaker03:/home/openstack# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:9b:d5:33 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.12/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.100.100/24 brd 192.168.100.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe9b:d533/64 scope link
       valid_lft forever preferred_lft forever
root@pacemaker03:/home/openstack# iptables -nvL
Chain INPUT (policy ACCEPT 610 packets, 78996 bytes)
pkts bytes target     prot opt in     out     source               destination
    0     0 CLUSTERIP all -- eth0   *       0.0.0.0/0            192.168.100.100      CLUSTERIP hashmode=sourceip-sourceport clustermac=31:B3:55:6F:CE:87 total_nodes=2 local_node=2 hash_init=0

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 390 packets, 44182 bytes)
pkts bytes target prot opt in out source destination

看到两个iptables的不同了吧，就是local_node不同

root@pacemaker03:/home/openstack# crm configure clone clone-apache WebSite
root@pacemaker03:/home/openstack# crm_mon -1
Last updated: Sat Aug 2 18:49:40 2014
Last change: Sat Aug 2 18:49:33 2014 via cibadmin on pacemaker03
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
5 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

Clone Set: clone-vip [ClusterIP] (unique)
     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started pacemaker01
     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started pacemaker03
Clone Set: clone-apache [WebSite]
     Started: [ pacemaker01 pacemaker02 pacemaker03 ]

下面我们访问apache网站，发现有时候是pacemaker1返回，有时候是pacemaker2返回。

root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker03</body>
</html>
root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker03</body>
</html>
root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker03</body>
</html>
root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker03</body>
</html>
root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker01</body>
</html>
root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker01</body>
</html>
root@pacemaker02:/home/openstack# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker03</body>
</html>

最后一种Clone是stateful Clone

也即每种clone都是有状态的，主要是两种状态active和passive，和普通的active和passive不同，active可以是多个。

master-max ：How many copies of the resource can be promoted to master status; default 1.

为了支持stateful Clone，resource agent需要有action: promote和demote

Stateful的一个很好的例子是mysql

mysql的resource agent是基于mysql replication技术进行的，mysql的instance有的是master，有的是slave

在文件/usr/lib/ocf/resource.d/heartbeat/mysql中

start) mysql_start

resource agent的start会调用mysql_start

在mysql_start中，会调用下面的命令

${OCF_RESKEY_binary} --defaults-file=$OCF_RESKEY_config \
--pid-file=$OCF_RESKEY_pid \
--socket=$OCF_RESKEY_socket \
--datadir=$OCF_RESKEY_datadir \
--user=$OCF_RESKEY_user $OCF_RESKEY_additional_parameters \
$mysql_extra_params >/dev/null 2>&1 &
rc=$?

其中OCF_RESKEY_binary_default="/usr/local/bin/mysqld_safe"，这是启动mysql进程的命令

启动了mysql进行后，等待一段时间，于是判断ocf_is_ms，是否是master/slave模式

如果是，则首先将当前的mysql设为readonly状态，set_read_only on，因为不知道当前是否已经有一个master在运行，所以以slave的方式先启动

master_host=`echo $OCF_RESKEY_CRM_meta_notify_master_uname|tr -d " "`
if [ "$master_host" -a "$master_host" != ${HOSTNAME} ]; then
    ocf_log info "Changing MySQL configuration to replicate from $master_host."
    set_master
    start_slave
    if [ $? -ne 0 ]; then
        ocf_log err "Failed to start slave"
        return $OCF_ERR_GENERIC
    fi
else
    ocf_log info "No MySQL master present - clearing replication state"
    unset_master
fi

接下来我们看OCF_RESKEY_CRM_meta_notify_master_uname，这个是pacemaker notify action的结果，

http://www.linux-ha.org/doc/dev-guides/_literal_notify_literal_action.html

$OCF_RESKEY_CRM_meta_notify_master_uname — node name of the node where the resource currently is in the Master role

$OCF_RESKEY_CRM_meta_notify_promote_uname — node name of the node where the resource currently is being promoted to the Master role (promote notifications only)

$OCF_RESKEY_CRM_meta_notify_demote_uname — node name of the node where the resource currently is being demoted to the Slave role (demote notifications only)

当有其他的mysql被选举称为master的时候，则set_master，会调用

ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
    -e "CHANGE MASTER TO MASTER_HOST='$new_master', \
    MASTER_USER='$OCF_RESKEY_replication_user', \
    MASTER_PASSWORD='$OCF_RESKEY_replication_passwd' $master_params"

将slave指向master

start_slave() {

ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
-e "START SLAVE"
}

如果当前还没有master，则调用unset_master

# Now, stop all slave activity and unset the master host
ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
    -e "STOP SLAVE"
if [ $? -gt 0 ]; then
    ocf_log err "Error stopping rest slave threads"
    exit $OCF_ERR_GENERIC
fi

ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
    -e "RESET SLAVE;"
if [ $? -gt 0 ]; then
    ocf_log err "Failed to reset slave"
    exit $OCF_ERR_GENERIC
fi

如果当期没有master，但是$master_host" == ${HOSTNAME}的时候，说明你自己被选作了master.

最后$CRM_MASTER -v 1

CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot "

crm_master - A convenience wrapper for crm_attribute Set, update or delete a resource's promotion score

-l, --lifetime=value
Until when should the setting take affect. Valid values: reboot, forever

当你自己是master的时候，会被调用promote action

调用mysql_promote，里面首先会stop slave，终止自己作为slave的角色

ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
-e "STOP SLAVE"

# Set Master Info in CIB, cluster level attribute
update_data_master_status
master_info="$(get_local_ip)|$(get_master_status File)|$(get_master_status Position)"
${CRM_ATTR_REPL_INFO} -v "$master_info"

自己将要是master了，update_data_master_status中

update_data_master_status() {

master_status_file="${HA_RSCTMP}/master_status.${OCF_RESOURCE_INSTANCE}"

$MYSQL $MYSQL_OPTIONS_REPL -e "SHOW MASTER STATUS\G" > $master_status_file
}

将master的status保存在文件里面

CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name ${INSTANCE_ATTR_NAME}_REPL_INFO -s mysql_replicatio
n"

将master的信息写入CIB

set_read_only off

正式成为master

$CRM_MASTER -v $((${OCF_RESKEY_max_slave_lag}+1))当前的master有个一个更高的score，从而原来的master回来的时候，不至于switch回去。

当其他的slave收到notify，有了新的master诞生了，马上投靠

在mysql_notify函数中

post-promote会unset_master，set_master，start_slave

Resource Templates

如果想定义多个resources，有相似的配置，则可以使用resource templates

crm configure rsc_template BigVM ocf:heartbeat:Xen params allow_mem_management=”true” op monitor timeout=60s interval=15s op stop timeout=10m op start timeout=10m

我们可以基于他生成一个resource

crm configure primitive MyVM1 @BigVM params xmfile=”/etc/xen/shared-vm/MyVM1” name=”MyVM1”

也可以覆盖template中的参数

Resource的参数

Resource多有以下几种参数，

一种称为Resource Options (Meta Attributes)，在定义中，常用meta进行定义，在resource agent里面多采用OCF_RESKEY_CRM_meta_XXX

通过crm_resource可以进行管理

List the configured resources:

# crm_resource --list

root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# crm_resource --list
Clone Set: clone-vip [ClusterIP] (unique)
     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
Clone Set: clone-apache [WebSite]
     Started: [ pacemaker01 pacemaker02 pacemaker03 ]

List the available OCF agents:

# crm_resource --list-agents ocf

List the available OCF agents from the linux-ha project:

# crm_resource --list-agents ocf:heartbeat

Display the current location of 'myResource':

# crm_resource --resource myResource --locate

root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# crm_resource --resource clone-apache --locate
resource clone-apache is running on: pacemaker03
resource clone-apache is running on: pacemaker01
resource clone-apache is running on: pacemaker02
root@pacemaker01:/usr/lib/ocf/resource.d/heartbeat# crm_resource --resource clone-vip --locate
resource clone-vip is running on: pacemaker01
resource clone-vip is running on: pacemaker03

Move 'myResource' to another machine:

# crm_resource --resource myResource --move

Move 'myResource' to a specific machine:

# crm_resource --resource myResource --move --node altNode

Allow (but not force) 'myResource' to move back to its original location:

# crm_resource --resource myResource --un-move

Tell the cluster that 'myResource' failed:

# crm_resource --resource myResource --fail

Stop a 'myResource' (and anything that depends on it):

# crm_resource --resource myResource --set-parameter target-role --meta --parameter-value Stopped

Tell the cluster not to manage 'myResource':

The cluster will not attempt to start or stop the resource under any circumstances.
Useful when performing maintenance tasks on a resource.

# crm_resource --resource myResource --set-parameter is-managed --meta --parameter-value false

Erase the operation history of 'myResource' on 'aNode':

The cluster will 'forget' the existing resource state (including any errors) and attempt to recover the resource.
Useful when a resource had failed permanently and has been repaired by an administrator.

# crm_resource --resource myResource --cleanup --node aNode

crm_resource --meta --resource Email --set-parameter priority --property-value 100
crm_resource --meta --resource Email --set-parameter multiple-active --property-value block

第二种是Instance Attributes (Parameters)，多用params参数表示，crm ra info IPaddr2可以看到所有的参数，这些参数会传到resource agent里面

如OCF_RESKEY_cidr_netmask

crm_resource --resource Public-IP --set-parameter ip --property-value 1.2.3.4

第三种是Resource Operations，多用参数op表示，action常为monitor, start, stop，设置一般设置interval, timeout等。表示每过interval的时间，resource agent都会调用monitor来查看状态，而且start或者stop的时间不能超过timeout

requires指的是在什么条件下，操作才进行nothing, quorum, fencing

on-fail指的是当resource fail了，进行什么操作，ignore, stop, restart, fence, standby

role是什么角色才进行操作，有stopped, started, master比如

op monitor interval=”300s” role=”Stopped” timeout=”10s”

op monitor interval=”30s” timeout=”10s”

表示在running的情况下30s一监控，在stopped情况下，300s一监控。

Setting Global Defaults for Operations

crm_attribute --type op_defaults --attr-name timeout --attr-value 20s

When Resources Take a Long Time to Start/Stop

There are a number of implicit operations that the cluster will always perform - start, stop and a non-recurring monitor operation (used at startup to check the resource isn't already active). If one of these is taking too long, then you can create an entry for them and simply specify a new value.

  <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
    <operations>
     <op id="public-ip-startup" name="monitor" interval="0" timeout="90s"/>
     <op id="public-ip-start" name="start" interval="0" timeout="180s"/>
     <op id="public-ip-stop" name="stop" interval="0" timeout="15min"/>
    </operations>
    <instance_attributes id="params-public-ip">
       <nvpair id="public-ip-addr" name="ip" value="1.2.3.4"/>
    </instance_attributes>
  </primitive>

Multiple Monitor Operations

To tell the resource agent what kind of check to perform, you need to provide each monitor with a different value for a common parameter. The OCF standard creates a special parameter called OCF_CHECK_LEVEL for this purpose and dictates that it is made available to the resource agent without the normal OCF_RESKEY_ prefix.

  <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
    <operations>
     <op id="public-ip-health-60" name="monitor" interval="60">
       <instance_attributes id="params-public-ip-depth-60">
           <nvpair id="public-ip-depth-60" name="OCF_CHECK_LEVEL" value="10"/>
       </instance_attributes>
     </op>
     <op id="public-ip-health-300" name="monitor" interval="300">
       <instance_attributes id="params-public-ip-depth-300">
           <nvpair id="public-ip-depth-300" name="OCF_CHECK_LEVEL" value="20"/>
       </instance_attributes>
     </op>
    </operations>
    <instance_attributes id="params-public-ip">
       <nvpair id="public-ip-level" name="ip" value="1.2.3.4"/>
    </instance_attributes>
  </primitive>

posted @ 2014-08-03 20:16 popsuper1982 阅读(3386) 评论(0) 收藏举报

刷新页面返回顶部

刘超的通俗云计算

High Availability手册(3): 配置

Global Cluster Options

no-quorum-policy

stonith-enabled

Cluster Resources

最常用的的是primitives类型，也即基本类型

第二种resource的类型是group resource

Clones

Resource Templates

Resource的参数

公告