DRBD

http://www.drbd.org  
http://drbd.linbit.org
http://www.drbd.org/en/doc/users-guide-84

什么是DRBD
  Distributed Replicated Block Device(DRBD)是基于块设备,是一种通过TCP/IP网络,在不同的高可用服务器对之间同步和镜像数据的软件
  通过它可以实现在网络中的两台服务器之间基于块设备级别的实时或异步镜像或同步复制,其实就是类似于rsync+inotify这样的架构项目软件
  只不过drbd是基于文件系统底层,即block层级同步,而rsync+inotify是基于文件系统之上的实际物理文件的同步,因此,drbd效率更高,效果更好 提示:上文中提到的块设备可以是磁盘分区、LVM逻辑卷,或整块磁盘等,但不能是目录 drbd工作原理   DRBD是一种块设备,可以被用于高可用(HA)之中.它类似于一个网络RAID-1功能.当你将数据写入本地 文件系统时,数据还将会被发送到网络中另一台主机上.以相同的形式记录在一个文件系统中。
  本地(主节点)与远程主机(备节点)的数据可以保证实时同步.当本地系统出现故障时,远程主机上还会保留有一份相同的数据,可以继续使用.   在高可用(HA)中使用DRBD功能,可以代替使用一个共享盘阵.因为数据同时存在于本地主机和远程主机上,切换时,远程主机只要使用它上面的那份备份数据,就可以继续进行服务了。
  原理图:


DRBD数据镜像同步模式:
  1、实时同步模式
  仅仅当数据写入到本地磁盘和远端服务器磁盘都写入成功后才会返回成功写入。DRBD服务的协议C级别就是这种实时同步模式,可以防止本地和远端数据丢失和不一致,此种模式是生产环境最常使用模式
  2、异步同步模式
  当数据写入到本地服务器成功后就返回成功写入,不管远端服务器是否写入成功。
    a、当数据写入到本地服务器以及发送到本地的TCP BUFFER后返回成功写入,这是DRBD服务的协议A级别的工作模式
    b、当数据写入到本地服务器以及发送到远端节点后,返回成功写入,这是DRBD服务的协议B级别工作模式
  提示:nfs服务器的配资参数sync和async,mount挂载参数也有sync和async

 

DRBD 的3种同步复制协议  性能 A >B>C  数据一致性 A<B<C
  协议A:异步复制协议。本地写成功后立即返回,数据放在发送tcp/ip buffer中,可能丢失。
  协议B:内存同步(半同步)复制协议。本地写成功并将数据发送到对方接收buffer后立即返回,如果双机掉电,数据可能丢失。
  协议C:同步复制协议。本地和对方写成功确认后返回。如果双机掉电或磁盘同时损坏,则数据可能丢失。
一般用协议C,但选择C协议将影响流量,从而影响网络时延。为了数据可靠性,我们在生产环境中还是用C协议。

提示:工作中一般使用协议C,协议不同,将影响数据一致性,以及网络延时
对于A\B协议要考虑数据丢失的风险,当数据写在缓冲区,没有真正写到磁盘上时,系统奔溃会导致数据丢失的风险,有些带电池的硬盘控制器,如带电池的Dell PERC Raid卡,不但带缓存且自带电池

DRBD裂脑(Split brain)
    当心跳线路出现暂时性故障时,会导致两端都各自提升为Primary,当两端再次连通时,需要手工处理
  裂脑是一个因所有集群节点网络中断产生的一种状态,可能原因有集群管理器的介入,热内错误,此时两个节点都会提升为主
  这是个潜在的非常有害的状态,这意味着数据的修改写到了一端,但是没有及时同步到对端,这样可能两端都会产生不同的数据,导致数据无法合并


 

IP及分区规划

本次实验IP规划
data-1-1(master)
    eth0: 192.168.0.91
    eth1: 192.168.1.91         ==>心跳,不设置DNS及gateway
    VIP:  192.168.0.191
data-1-2(backup)
    eth0: 192.168.0.92
    eth1: 192.168.1.92        ==>心跳,不设置DNS及gateway
    VIP:  192.168.0.191

两台服务器上配置心跳路由,来实现两台机器检查对端时使用这个心跳线线路检查写入到rc.local(测试中查看到重启并没失效)
添加网卡,配置心跳IP及心跳路由)
data-1-1上    /sbin/route add -host 192.168.1.92 dev eth1
data-1-2上    /sbin/route add -host 192.168.1.91 dev eth1


分区规划
data-1-1/data-1-2
Device   Mount Point   容量   作用
/dev/sdb1 /data       1G 数据,可以先格式化,也可以后期格式化drbd0
/dev/sdb2 meta data 1G 存储drbd同步状态信息
Metadata的信息包括DRDB设备大小,产生的标识,活动日志,快速同步位图





注意:
1、这里的meta分区一定不能格式化建立文件系统
2、分好的分区现在不能进行挂载
3、经验:meat分区在生产环境一般设为1-2G

安装配置DRBD

安装drbd
To install ELRepo for RHEL-6, SL-6 or CentOS-6:

  #rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
  #yum install drbd kmod-drbd84 -y  用户空间,内核空间   #rpm -qa |grep drbd
  drbd84-utils-8.9.8-1.el6.elrepo.x86_64
  kmod-drbd84-8.4.9-1.el6.elrepo.x86_64
  #
modprobe drbd   #lsmod | grep drbd   drbd 372759 0   libcrc32c 1246 1 drbd   #echo "modprobe drbd" >>/etc/sysconfig/modules/drbd.modules #考虑不自动接管,不是必须的   #chmod 755 /etc/sysconfig/modules/drbd.modules
wget  http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
export LC_ALL=C 
vi /etc/profile增加export LC_ALL=C并执行source /etc/profile
tar -zxf drbd-8.4.4.tar.gz && cd  drbd-8.4.4
 ./configure --prefix=/application/drbd8.4.4 --with-km --with-heartbeat --sysconfdir=/etc/
#--with-km  Enable kernel module
#--with-heartbeat  Enable Heartbeat integration
make KDIR=/usr/src/kernels/$(uname -r)/    #指定内核源码路径(若无yum install kernel-devel)
make install
modprobe drbd
lsmod | grep drbd
drbd                  372759  0 
libcrc32c               1246  1 drbd
echo "modprobe drbd" >>/etc/sysconfig/modules/drbd.modules
chmod 755 /etc/sysconfig/modules/drbd.modules
编译安装参考

 

编辑配置文件

#vi /etc/drbd.conf             #原先内容可删除,hosts文件建议改成心跳线网卡,两边配置相同,etc/drbd.d/global_common.conf  grobal及common段配置参考
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
#include "drbd.d/global_common.conf";
#include "drbd.d/*.res";
global {
        usage-count no;
}
#是否参加drbd使用者统计,默认是yes

common {
        syncer {
        rate 100M;      #设置主备节点同步时的网络速率最大值,单位是字节
        verify-alg crc32c;   #Enable on-line verification at least from sha1.md5,and crc32c数据校验算法
}
}
#这里的common,指的是drbd所管理的多个资源之间的common。配置项里面主要是配置drbd所有resource可以设置为相同的参数项,比如protocol,syncer等在线调整参数:
对现有资源的配置文件进行修改,两个对等节点要保持一致,然后执行 drbdadm adjust <resource> 在两个节点上都要执行;

#primary for drbd1
#can create resource data2 instant
resource data {                #可以参照data方式添加多个,ascii码,不含空格
        protocol C;
         disk {
         on-io-error    detach;
             }
on data-1-1  {
        device /dev/drbd0;            #和/dev/sdb1绑定
        disk   /dev/sdb1;
        address 192.168.1.91:7788;     #监听端口,用于和另一台主机通信
        meta-disk /dev/sdb2[0];       
}
on data-1-2 {
        device /dev/drbd0;
        disk   /dev/sdb1;
        address 192.168.1.92:7788;
        meta-disk /dev/sdb2[0];     #外部模式
        #meta-disk internal;        #内部模式,数据和metal放一个区(放在sdb1内,不单独分区,有数据的话会有问题)
}
}

global {
# minor-count 64;
# dialog-refresh 5; # 5 seconds
# disable-ip-verification;
usage-count no;
}

common {
protocol C;
disk {
on-io-error   detach;
#size 454G;
no-disk-flushes;
no-md-flushes;
}

net {
sndbuf-size 512k;
# timeout       60;    #  6 seconds  (unit = 0.1 seconds)
# connect-int   10;    # 10 seconds  (unit = 1 second)
# ping-int      10;    # 10 seconds  (unit = 1 second)
# ping-timeout   5;    # 500 ms (unit = 0.1 seconds)
max-buffers     8000;
unplug-watermark   1024;
max-epoch-size  8000;
# ko-count 4;
# allow-two-primaries;
cram-hmac-alg "sha1";
shared-secret "hdhwXes23sYEhart8t";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
# data-integrity-alg "md5";
# no-tcp-cork;
}

syncer {
rate 120M;
al-extents 517;
}

}

resource data {
on data-1-1 {
device     /dev/drbd0;
disk       /dev/sdb1;
address    192.168.1.91:7788;
meta-disk  /dev/sdb2[0];
}

on data-1-2 {
device     /dev/drbd0;
disk       /dev/sdb1;
address    192.168.1.92:7788;
meta-disk  /dev/sdb2[0];
}
}
drbd.conf带优化参数

 degr-wfc-timeout 等待连接的超时参数,当只启动主端时,会等待连接备端

***************************************************************
 DRBD's startup script waits for the peer node(s) to appear.
 - If this node was already a degraded cluster before the
   reboot, the timeout is 0 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot, the timeout
   is 0 seconds. [wfc-timeout]
   (These values are for resource 'data'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [ -- ]: [  11]:[  12]:

启动drbd

#drbdadm create-md data      #创建供drbd记录信息的metadata分区数据块,data是resource名
#/etc/init.d/drbd start      #或drbdadm up data或drbdadm up/down all  
                        drbdadm up all此处等于drbdadmin attach all + drbdadm syncer all + drbdadm connect all

[root@data-1-1 ~]# cat /proc/drbd  状态查看
version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1028132

[root@data-1-1 ~]# drbd-overview   或/etc/init.d/drbd status
 0:data/0  Connected Secondary/Secondary Inconsistent/Inconsistent

[root@data-1-2 ~]# cat /proc/drbd
version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1028132

==>此时,两个节点都处于Secondary状态

 

同步DRBD数据到对端SERVER,使用数据保持一致
注意,如果是空盘,可以随意执行,不考虑数据。如果两边数据不一致,需要考虑同步方向

将一端置为主
[root@data-1-1 ~]#  drbdadm primary data    #数据还不一致    
0: State change failed: (-2) Need access to UpToDate data
Command 'drbdsetup-84 primary 0' terminated with exit code 17

[root@data-1-1 ~]# drbdadm -- --overwrite-data-of-peer primary data
[root@data-1-1 ~]# cat /proc/drbd    #查看状态,可以看到正在同步数据块,data-1-1也已经置为主
version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:323584 nr:0 dw:0 dr:324241 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:704548
        [=====>..............] sync'ed: 31.8% (704548/1028132)K
        finish: 0:00:17 speed: 40,448 (40,448) K/sec
 The resource-specific output from/proc/drbd contains various pieces ofinformation about the resource:
*cs(connection state). Status of the network connection. See the section called “Connection states” for details about the various connection states.
*ro(roles). Roles of the nodes. The role of the local node isdisplayed first, followed by the role of the partnernode shown after the slash. See the section called “Resource roles” for details about thepossible resource roles.
*ds(disk states). State of the hard disks. Prior to the slash thestate of the local node is displayed, after the slashthe state of the hard disk of the partner node isshown. See the section called “Disk states” for details about the variousdisk states.
*ns(network send). Volume of net data sent to the partner via thenetwork connection; in Kibyte.
*nr(network receive). Volume of net data received by the partner viathe network connection; in Kibyte.
*dw(disk write). Net data written on local hard disk; inKibyte.
*dr(disk read). Net data read from local hard disk; in Kibyte.
*al(activity log). Number of updates of the activity log area of the metadata.
*bm(bit map). Number of updates of the bitmap area of the metadata.
*lo(local count). Number of open requests to the local I/O sub-systemissued by DRBD.
*pe(pending). Number of requests sent to the partner, but thathave not yet been answered by the latter.
*ua(unacknowledged). Number of requests received by the partner via thenetwork connection, but that have not yet beenanswered.
*ap(application pending). Number of block I/O requests forwarded to DRBD, butnot yet answered by DRBD.
*ep(epochs). Number of epoch objects. Usually 1. Might increaseunder I/O load when using either thebarrier or the nonewriteordering methd. Since 8.2.7.
*wo(write order). Currently used write ordering method:b (barrier), f(flush),d(drain) or n(none). Since8.2.7.
*oos(out of sync). Amount of storage currently out of sync; inKibibytes. Since 8.2.6. 
各状态信息

[root@data-1-1 ~]#  drbdadm cstate data  #查看连接状态
Connected

A resource may have one of the following connectionstates: 
*StandAlone. No network configuration available. The resourcehas not yet been connected, or has beenadministratively disconnected (using drbdadm disconnect), or has dropped its connectiondue to failed authentication or split brain.   独立的:网络配置不可用;资源还没有被连接或是被管理断开(使用 drbdadm disconnect 命令),或是由于出现认证失败或是脑裂的情况
*Disconnecting. Temporary state during disconnection. The nextstate is StandAlone. 断开:断开只是临时状态,下一个状态是StandAlone独立的
*Unconnected. Temporary state, prior to a connection attempt.Possible next states: WFConnection andWFReportParams. 悬空:是尝试连接前的临时状态,可能下一个状态为WFconnection和WFReportParams
*Timeout. Temporary state following a timeout in thecommunication with the peer. Next state:Unconnected. 超时:与对等节点连接超时,也是临时状态,下一个状态为Unconected悬空
*BrokenPipe. Temporary state after the connection to the peerwas lost. Next state: Unconnected. 与对等节点连接丢失,也是临时状态,下一个状态为Unconected悬空
*NetworkFailure. Temporary state after the connection to thepartner was lost. Next state: Unconnected. 与对等节点推动连接后的临时状态,下一个状态为Unconected悬空
*ProtocolError. Temporary state after the connection to thepartner was lost. Next state: Unconnected. 与对等节点推动连接后的临时状态,下一个状态为Unconected悬空
*TearDown. Temporary state. The peer is closing theconnection. Next state: Unconnected. 拆解:临时状态,对等节点关闭,下一个状态为Unconected悬空
*WFConnection. This node is waiting until the peer node becomesvisible on the network. 等待和对等节点建立网络连接
*WFReportParams. TCP connection has been established, this nodewaits for the first network packet from thepeer. 已经建立TCP连接,本节点等待从对等节点传来的第一个网络包
*Connected. A DRBD connection has been established, datamirroring is now active. This is the normalstate. 连接:DRBD已经建立连接,数据镜像现在可用,节点处于正常状态
*StartingSyncS. Full synchronization, initiated by theadministrator, is just starting. The next possiblestates are: SyncSource or PausedSyncS. 完全同步,有管理员发起的刚刚开始同步,未来可能的状态为SyncSource或PausedSyncS
*StartingSyncT. Full synchronization, initiated by theadministrator, is just starting. Next state:WFSyncUUID. 完全同步,有管理员发起的刚刚开始同步,下一状态为WFSyncUUID
*WFBitMapS. Partial synchronization is just starting. Nextpossible states: SyncSource or PausedSyncS. 部分同步刚刚开始,下一步可能的状态为SyncSource或PausedSyncS
*WFBitMapT. Partial synchronization is just starting. Nextpossible state: WFSyncUUID. 部分同步刚刚开始,下一步可能的状态为WFSyncUUID
*WFSyncUUID. Synchronization is about to begin. Next possiblestates: SyncTarget or PausedSyncT. 同步即将开始,下一步可能的状态为SyncTarget或PausedSyncT
*SyncSource. Synchronization is currently running, with thelocal node being the source ofsynchronization. 以本节点为同步源的同步正在进行
*SyncTarget. Synchronization is currently running, with thelocal node being the target ofsynchronization. 以本节点为同步目标的同步正在进行
*PausedSyncS. The local node is the source of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync. 以本地节点是一个持续同步的源,但是目前同步已经暂停,可能是因为另外一个同步正在进行或是使用命令(drbdadm pause-sync)暂停了同步
*PausedSyncT. The local node is the target of an ongoingsynchronization, but synchronization is currentlypaused. 以本地节点为持续同步的目标,但是目前同步已经暂停,这可以是因为另外一个同步正在进行或是使用命令(drbdadm pause-sync)暂停了同步This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync. 
*VerifyS. On-line device verification is currently running,with the local node being the source ofverification. 以本地节点为验证源的线上设备验证正在执行
*VerifyT. On-line device verification is currently running,with the local node being the target ofverification. 以本地节点为验证目标的线上设备验证正在执行
cstate

[root@data-1-1 ~]# drbdadm dstate all  #查看磁盘状态
UpToDate/UpToDate

本地和对等节点的硬盘有可能为下列状态之一:
Diskless     无盘:本地没有块设备分配给DRBD使用,这表示没有可用的设备,或者使用drbdadm命令手工分离或是底层的I/O错误导致自动分离
Attaching:    读取无数据时候的瞬间状态
Failed         失败:本地块设备报告I/O错误的下一个状态,其下一个状态为Diskless无盘
Negotiating:    在已经连接的DRBD设置进行Attach读取无数据前的瞬间状态
Inconsistent:    数据是不一致的,在两个节点上(初始的完全同步前)这种状态出现后立即创建一个新的资源。此外,在同步期间(同步目标)在一个节点上出现这种状态
Outdated:    数据资源是一致的,但是已经过时
DUnknown:    当对等节点网络连接不可用时出现这种状态
Consistent:    一个没有连接的节点数据一致,当建立连接时,它决定数据是UpToDate或是Outdated
UpToDate:    一致的最新的数据状态,这个状态为正常状态
dstate
[root@data-1-1 ~]# drbdadm role all  #查看角色
Primary/Secondary
如果发现unknow状态
1、检查两台服务器的物理连接或者IP及主机名、路由、hosts是否正确
2、iptables或selinux
3、Secondary/Unknown还有可能是发生裂脑导致的结果
4、初始配置不当,或正进行其他意外处理,也会导致裂脑

裂脑解决方案
a、在从节点data-1-2上
  drbdadm secondary data
  drbdadm -- --discard-my-data connect data   #放弃本端数据进行连接
b、在主节点data-1-1上,通过cat /proc/drbd查看状态,如果不是WFConnection状态,则需要手动连接
  drbdadm disconnect data
  drbdadm connect data
再次查看drbd状态
备节点如果还有问题
  drbdadmin disconnect data
  drbdadmin -- --discard-my-data connect data

 

实践drbd数据盘挂载

在drbd正常运行中,只有drbd0设备主端才可以被挂载使用(备端如需查看数据,需要停止drbd,挂载分区进行查看)

[root@data-1-1 ~]# mkdir /data
[root@data-1-1 ~]# mount /dev/drbd0 /data/
[root@data-1-1 ~]# cd /data
[root@data-1-1 data]# cat /proc/drbd 
version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:1240 nr:4 dw:1244 dr:2053 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@data-1-1 data]# touch `seq 1000`
[root@data-1-1 data]# cat /proc/drbd 
version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:1528 nr:4 dw:1532 dr:2053 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

从data-1-2上验证数据是否已经收到
[root@data-1-2 ~]# drbdadm down data
[root@data-1-2 ~]# mkdir /data
[root@data-1-2 ~]# mount /dev/sdb1 /data
[root@data-1-2 ~]# ls /data/
略,确认数据
再次启动备端drbd [root@data
-1-2 /]# umount /data [root@data-1-2 /]# drbdadm up data [root@data-1-2 /]# cat /proc/drbd version: 8.4.9-1 (api:1/proto:86-101) GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 从主段写数据后观察 [root@data-1-2 /]# cat /proc/drbd version: 8.4.9-1 (api:1/proto:86-101) GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by mockbuild@Build64R6, 2016-12-13 18:38:15 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:288 dw:288 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

 

 

 

另一套学习

 

posted @ 2017-05-01 02:25  黑色月牙  阅读(595)  评论(0编辑  收藏  举报