学习GlusterFS(七)
初始环境:
系统环境:centos73.10.0-514.26.2.el7.x86_64
机器数量:两台
硬盘:至少两块,一块为系统盘,另一块留作他用
命名规则:node1 node2
IP规划:192.168.238.129 node1
192.168.238.130 node2
1.磁盘设置(装完系统后,格式化另外的硬盘,两节点都需要)
[root@node1 ~]#fdisk /dev/sdb#格式化磁盘
2.设置磁盘格式挂载硬盘
(on both nodes):Note:
These examples are going to assume the brick is going to reside on /dev/sdb1.(两节点都需要做)
[root@node1 ~]#mkfs.xfs -i size=512/dev/sdb1
[root@node1 ~]#mkdir -p /export/sdb1&& mount /dev/sdb1 /export/sdb1#建立挂载目录和挂载
[root@node1 ~]#echo"/dev/sdb1 /export/sdb1 xfs defaults 0 0">> /etc/fstab#加入开机启动
[root@node1
~]#mount -a && mount#挂载测试和挂载及结果查看
3.安装glusterfs
[root@node1 ~]#yum installcentos-release-gluster#安装源
[root@node1 ~]#yuminstall glusterfs-server#安装glusterfs
4.注意关掉防火墙
[root@node1 ~]#systemctlstop firewalld#临时关闭,重启还会启动
[root@node1 ~]#systemctldisable firewalld#永久关闭
5.启动glusterfs
[root@node1 ~]#systemctl start glusterd#启动gluster
[root@node1 ~]#systemctl enable glusterd#加入开机启动
6.组建集群
Replace nodename with hostname of theother server in the cluster, or IP address if you don’t have DNS or /etc/hostsentries
在/etc/hosts里边做设置或者在DNS,建议两者都做,这里没有DNS解析服务器,只在/etc/hosts里边坐绑定,注意不要忘记该机器的名字。
For Example:
[root@node1 gluster]#cat /etc/hosts
127.0.0.1localhostlocalhost.localdomain localhost4 localhost4.localdomain4
::1localhostlocalhost.localdomain localhost6 localhost6.localdomain6
192.168.238.129 node1
192.168.238.130 node2
192.168.238.133 node3
192.168.238.132 node4
192.168.238.134 node5
#在node1节点运行peer probe命令
[root@node1 ~]#glusterpeer probe node2#注意,如果探测不到请确认host解析做了没有,防火墙是否关闭
#查看刚刚加入的节点
[root@node1 ~]#glusterpeer status
[root@node1 ~]#gluster peer probe node2
peer probe: success.
[root@node1 ~]#gluster peer status
Number of Peers: 1
Configure your Gluster volume
删除加入的节点(如果想删除节点,执行该命令)
[root@node1 ~]#glusterpeer detach node2
在俩个主机执行
[root@node1 ~]#mkdir -p/export/sdb1/brick
[root@node1~]#gluster volume create testvol replica 2 transport tcpnode1:/export/sdb1/brick node2:/export/sdb1/brick
volume create: testvol: success: please start the volume to accessdata
[root@node1 log]#gluster peer status
Number of Peers: 1
Hostname: node2
Uuid: 61fe987a-99ff-419d-8018-90603ea16fe7
State: Peer in Cluster (Connected)
[root@node1 log]#glustervolume info
Volume Name: testvol
Type: Replicate
Volume ID: bc637d83-0273-4373-9d00-d794a3a3d2e7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 =2
Transport-type: tcp
Bricks:
Brick1: node1:/export/sdb1/brick
Brick2: node2:/export/sdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[root@node1 ~]#gluster volume start testvol#启动volume
volume start: testvol: success
[root@node1 gluster]#gluster volume info#查看Volume信息
Volume Name: testvol
Type: Replicate
Volume ID: bc637d83-0273-4373-9d00-d794a3a3d2e7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: node1:/export/sdb1/bricks
Brick2: node2:/export/sdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[root@node2 ~]#gluster volume status
Status of volume: testvol
Gluster processTCP PortRDMA PortOnlinePid
------------------------------------------------------------------------------
Brick node1:/export/sdb1/brick491520Y10774
Brick node2:/export/sdb1/brickN/AN/ANN/A
Self-heal Daemon on localhostN/AN/AY998
Self-heal Daemon on node1N/AN/AY10794
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
注意点:上边的环境为每个服务器单块硬盘的结构,如果是多块硬盘的结构,应该为分别格式化,挂载,具体过程如下:
mkfs.xfs -i size=512 /dev/sdc1
mkdir /export/sdb1
mount /dev/sdc1 /export/sdc1
echo "/dev/sdc1 /export/sdc1 xfs defaults0 0">> /etc/fstab
mkdir -p /export/sdb1/brick#所有的节点执行
node1上执行:
每台主机一块硬盘执行的命令(上边我们配置的模式):
# gluster volume create testvol replica 2transport tcp node1:/export/sdb1/brick node2:/export/sdb1/brick
每台主机两块硬盘执行的命令:
# gluster volume create testvol replica 2transport tcp node1:/export/sdb1/brick node2:/export/sdb1/bricknode1:/export/sdc1/bricknode2:/export/sdc1/brick
照理推即。
挂载测试
[root@node1 ~]# mkdir/mnt/gluster/
[root@node1 ~]#mount -tglusterfs node1:/testvol /mnt/gluster/
Expanding
Volumes(扩容)
To expand a volume
前提条件开始有node1和node2节点,后加node3和node4节点的配置同上,必须一次加一对主机,注意修改集群里各个主机的/etc/hosts文件,新加入node节点
1)On the first server in the cluster,probe the server to which you want to add the new brick using the followingcommand:
# gluster peerprobe#执行的命令如下:
[root@node1 ~]#gluster peer probe node3
peer probe: success.
[root@node1 ~]#gluster peer probe node4
peer probe: success.
2)Add the brick using the followingcommand:
# gluster volumeadd-brick
[root@node1 ~]#gluster volume add-bricktestvol node3:/export/sdb1/brick node4:/export/sdb1/brick
volume add-brick:success#显示成功
3)Check the volume information using the following command:(检查命令如下)
[root@node1 ~]#gluster volume info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID:09363405-1c7c-4eb1-b815-b97822c1f274
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x2 = 4
Transport-type: tcp
Bricks:
Brick1:node1:/export/sdb1/brick
Brick2:node2:/export/sdb1/brick
Brick3:node3:/export/sdb1/brick
Brick4:node4:/export/sdb1/brick
Options Reconfigured:
transport.address-family:inet
nfs.disable: on
Rebalance the volumeto ensure that all files are distributed to the new brick.
You can use therebalance command as described in Rebalancing Volumes
[root@node1 ~]#gluster volume rebalancetestvol start
volume rebalance:testvol: success: Rebalance on testvol has been started successfully. Userebalance status command to check status of the rebalance process.
ID: c9d052e8-2b6c-40b0-8d77-52290bcdb61
To shrink a volume(在线收缩)
本例的操作
[root@node1 gluster]#gluster volume remove-bricktestvol node3:/export/sdb1/brick node4:/export/sdb1/brick force
Removing brick(s) can result in data loss. Do you want to Continue? (y/n)y
volume remove-brick commit force: success
详见后边的文档
1.Remove the brick using the following command:
# gluster volume remove-brickstart
For example, to remove server2:/exp2:
# gluster volume remove-brick test-volume server2:/exp2 force
Removingbrick(s) canresultindata loss.Doyou want toContinue? (y/n)
2.Enter "y" to confirm the operation. The commanddisplays the following message indicating that the remove brick operation issuccessfully started:
3.RemoveBrick successful
4.(Optional) View the status of the remove brick operation usingthe following command:
# gluster volume remove-brickstatus
For example, to view the status of removebrick operation on server2:/exp2 brick:
#glustervolumeremove-bricktest-volumeserver2:/exp2status
NodeRebalanced-filessizescannedstatus
-----------------------------------------------
617c923e-6450-4065-8e33-865e28d9428f34340162inprogress
5.Check the volume information using the following command:
# gluster volume info
The command displays information similar tothe following:
# gluster volume info
VolumeName:test-volume
Type:Distribute
Status:Started
Number ofBricks:3
Bricks:
Brick1:server1:/exp1
Brick3:server3:/exp3
Brick4:server4:/exp4
6.Rebalance the volume to ensure that all files are distributed tothe new brick.
You can use the rebalance command as described inRebalancing Volumes
主机上硬盘故障的处理方式
方法一:如果本机还有空闲的备用盘的处理方式
故障问题:
[root@node2 ~]#gluster volume status
Status of volume:testvol
Gluster processTCP PortRDMA PortOnlinePid
------------------------------------------------------------------------------
Bricknode1:/export/sdb1/brick491520Y2684
Brick node2:/export/sdb1/brickN/AN/ANN/A#sdb1显示不在线,故障
Bricknode1:/export/sdc1/brick491530Y2703
Bricknode2:/export/sdc1/brick491530Y2704
Bricknode3:/export/sdb1/brick491520Y2197
Bricknode4:/export/sdb1/brick491520Y2207
Bricknode3:/export/sdc1/brick491530Y2216
Bricknode4:/export/sdc1/brick491530Y2226
Self-heal Daemon onlocalhostN/AN/AY1393
Self-heal Daemon onnode1N/AN/AY3090
Self-heal Daemon onnode4N/AN/AY2246
Self-heal Daemon onnode3N/AN/AY2236
Task Status of Volumetestvol
------------------------------------------------------------------------------
Task: Rebalance
ID:8b3a04a0-0449-4424-a458-29f602571ea2
Status: completed
从上方看到Brick node2:/export/sdb1/brick不在线,出现了问题
解决:
1.创建新的数据目录,将备用的硬盘格式化,挂载到系统中去(故障主机上执行)
[root@node2 ~]#mkfs.xfs -i size=512/dev/sdd1#格式化
[root@node2 ~]#mkdir /export/sdd1/brick-p#建立相关的目录
[root@node2 ~]#mount /dev/sdd1 /export/sdd1#挂载
[root@node2 ~]#echo "/dev/sdd1 /export/sdd1 xfs defaults 00">> /etc/fstab#加入开机启动
2.查询故障点的目录的扩展属性(正常主机执行)
[root@node1 brick]#getfattr -d -m. -e hex /export/sdb1/brick/
getfattr: Removingleading '/' from absolute path names
# file:export/sdb1/brick/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe
trusted.glusterfs.dht.commithash=0x3000
trusted.glusterfs.volume-id=0xe107222fa1134606a9a7fcb16e4c0709
3.挂载卷并触发自愈(故障主机执行)
[root@node2 ~]#mount -t glusterfs node2:/testvol /mnt#挂载点随便,不重复就可以,node2:/testvol为之前生成的卷
[root@node2 ~]#mkdir /mnt/test#新建一个卷中不存在的目录并删除,根据你的挂载点的位置变换执行
[root@node2 ~]#rmdir /mnt/test#删除刚才新建立的目录
[root@node2 ~]#setfattr -n trusted.non-existent-key -v abc /mnt#设置扩展属性触发自愈
[root@node2 ~]#setfattr-x trusted.non-existent-key /mnt#设置扩展属性触发自愈
4.检查当前节点是否挂起
正常的主机执行
[root@node1 gluster]#getfattr -d -m. -e hex /export/sdb1/brick/#
/export/sdb1/brick/你建立brick的位置
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/brick/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x000000000000000400000004<<---- xattrs are marked
from source brick node1:/export/sdb1/brick--->>
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe
trusted.glusterfs.dht.commithash=0x3334343336363233303800
trusted.glusterfs.volume-id=0xe107222fa1134606a9a7fcb16e4c0709
故障主机执行(正常也可以):
[root@node2 gluster]#gluster volume heal testvol info#查看你testvol的信息
Brick node1:/export/sdb1/brick
/
Status: Connected
Number of entries: 1
Brick node2:/export/sdb1/brick
Status: Transport endpoint is not connected
Number of entries: -#状态显示传输端点未连接
Brick node1:/export/sdc1/brick
Status: Connected
Number of entries: 0
Brick node2:/export/sdc1/brick
Status: Connected
Number of entries: 0
Brick node3:/export/sdb1/brick
Status: Connected
Number of entries: 0
Brick node4:/export/sdb1/brick
Status: Connected
Number of entries: 0
Brick node3:/export/sdc1/brick
Status: Connected
Number of entries: 0
Brick node4:/export/sdc1/brick
Status: Connected
Number of entries: 0
5.使用强制提交完成修复操作
故障机执行
[root@node2 ~]#gluster volume replace-brick testvol node2:/export/sdb1/bricknode2:/export/sdd1/brick commit force
volume replace-brick: success: replace-brick commit force operationsuccessful#提示成功
[root@node2 ~]#gluster volume status
Status of volume: testvol
Gluster processTCP PortRDMA PortOnlinePid
------------------------------------------------------------------------------
Brick node1:/export/sdb1/brick491520Y2684
Brick node2:/export/sdd1/brick491540Y10298#在线盘已经是sdd1,已经吧sdb1替换了
Brick node1:/export/sdc1/brick491530Y2703
Brick node2:/export/sdc1/brick491530Y2704
Brick node3:/export/sdb1/brick491520Y2197
Brick node4:/export/sdb1/brick491520Y2207
Brick node3:/export/sdc1/brick491530Y2216
Brick node4:/export/sdc1/brick491530Y2226
Self-heal Daemon on localhostN/AN/AY10307
Self-heal Daemon on node3N/AN/AY9728
Self-heal Daemon on node1N/AN/AY3284
Self-heal Daemon on node4N/AN/AY9736
Task Status of Volume testvol
------------------------------------------------------------------------------
Task:Rebalance
ID:8b3a04a0-0449-4424-a458-29f602571ea2
Status: notstarted
正常主机执行
[root@node1 gluster]#getfattr -d -m. -e hex /export/sdb1/brick/
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/brick/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x000000000000000000000000<<---- Pending changelogs are cleared.
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe
trusted.glusterfs.dht.commithash=0x3334343336363233303800
trusted.glusterfs.volume-id=0xe107222fa1134606a9a7fcb16e4c0709
[root@node2 ~]#gluster volume heal testvol info
Brick node1:/export/sdb1/brick
Status: Connected
Number of entries: 0
Brick node2:/export/sdd1/brick
Status: Connected
Number of entries: 0
Brick node1:/export/sdc1/brick
Status: Connected
Number of entries: 0
Brick node2:/export/sdc1/brick
Status: Connected
Number of entries: 0
Brick node3:/export/sdb1/brick
Status: Connected
Number of entries: 0
Brick node4:/export/sdb1/brick
Status: Connected
Number of entries: 0
Brick node3:/export/sdc1/brick
Status: Connected
Number of entries: 0
Brick node4:/export/sdc1/brick
Status: Connected
Number of entries: 0
也可以使用(上边的为官方文档的修复过程,用下边的也可以搞定):
当某块磁盘损坏后,需要换一块新的磁盘回去,这时集群中恰好还预留了备用磁盘,因此用备用磁盘替换损坏的磁盘,命令如下两条命令这个就可以搞定
挂载磁盘参考上边(挂载点/export/sdd1)
[root@node2 ~]#gluster volume replace-brick voltest node2:/export/sdb1/bricknode2:/export/sdd1/brickcommit(前边为故障盘,后边为恢复盘)
方式二:跨主机的同步
假设node2的sdb1有问题
前提:加入新主机node5(两块硬盘,一块系统盘,一块挂在盘),node5磁盘格式化,挂载和gluster的安装过程等准备工作同上,注意修改集群里各个主机的/etc/hosts文件,新加入node节点:
将node5加入信任池
[root@node1 brick]#gluster peer probe node5
peer probe: success.
挂载磁盘
[root@node5 ~]#mkdir -p /export/sdb1 && mount /dev/sdb1 /export/sdb1
[root@node5 ~]#echo "/dev/sdb1 /export/sdb1 xfs defaults 0 0">> /etc/fstab
[root@node5 ~]#mount -a && mount
执行下边的命令:
[root@node5 ~]#gluster volume replace-brick testvol node2:/export/sdb1/bricknode5:/export/sdb1/brickcommit force
volume replace-brick: success: replace-brick commit force operationsuccessful
替换后可以继续使用,也可以在sdb1硬盘更换后后,数据倒回,命令如下
[root@node2 ~]#gluster volume replace-brick testvol node5:/export/sdb1/bricknode2:/export/sdb1/brick commit force
volume replace-brick: success: replace-brick commit force operationsuccessful
替换之前的状态:
[root@node1 brick]# gluster volume status
Status of volume: testvol
Gluster processTCP PortRDMA PortOnlinePid
------------------------------------------------------------------------------
Brick node1:/export/sdb1/brick491520Y2085
Brick
node5:/export/sdb1/brick491520Y18229
Brick node1:/export/sdc1/brick491530Y2076
Brick node2:/export/sdc1/brick491530Y2131
Brick node3:/export/sdb1/brick491520Y2197
Brick node4:/export/sdb1/brick491520Y2207
Brick node3:/export/sdc1/brick491530Y2216
Brick node4:/export/sdc1/brick491530Y2226
Self-heal Daemon on localhostN/AN/AY10565
Self-heal Daemon on node2N/AN/AY2265
Self-heal Daemon on node3N/AN/AY10416
Self-heal Daemon on node4N/AN/AY10400
Self-heal Daemon on node5N/AN/AY18238
Task Status of Volume testvol
------------------------------------------------------------------------------
Task:Rebalance
ID:8b3a04a0-0449-4424-a458-29f602571ea2
Status: notstarted
替换之后的状态:
[root@node1 gluster]# gluster volume status
Status of volume: testvol
Gluster processTCP PortRDMA PortOnlinePid
------------------------------------------------------------------------------
Brick node1:/export/sdb1/brick491520Y2085
Brick
node2:/export/sdb1/brick(过来了)491530Y10208
Brick node1:/export/sdc1/brick491530Y2076
Brick node2:/export/sdc1/brick491520Y3474
Brick node3:/export/sdb1/brick491520Y2197
Brick node4:/export/sdb1/brick491520Y2207
Brick node3:/export/sdc1/brick491530Y2216
Brick node4:/export/sdc1/brick491530Y2226
Self-heal Daemon on localhostN/AN/AY10684
Self-heal Daemon on node3N/AN/AY10498
Self-heal Daemon on node5N/AN/AY10075
Self-heal Daemon on node4N/AN/AY10488
Self-heal Daemon on node2N/AN/AY10201
Task Status of Volume testvol
------------------------------------------------------------------------------
Task:Rebalance
ID:8b3a04a0-0449-4424-a458-29f602571ea2
Status: notstarted
数据的平衡
一般平衡数据有如下两种场景:
Fix Layout:重新定位layout,原来的layout不变,(新)数据写入新的节点
Fix Layout and Migrate Data:重新定位layout的修改,并且迁移已经存在的数据
要注意的是当有新的节点加入的时候,必须做fix layout,不然新写入的数据还是写入到老的节点上去
1.To rebalance a volume to fix layout changes(fix
layout)
Start the rebalance operation on any one ofthe server using the following command:
# gluster volume rebalance fix-layout start
For example:
# gluster volume rebalance test-volumefix-layout start
Starting rebalance on volume test-volumehas been successful
本机的命令如下:
[root@node1
gluster]#gluster volume rebalance testvolfix-layout start
volume rebalance: testvol: success:Rebalance on testvol has been started successfully. Use rebalance statuscommand to check status of the rebalance process.
ID: 0ea5aa16-b349-44ca-a51b-d5fcf47e1272
[root@node1
gluster]#gluster volume rebalance testvol status
Nodestatusrun time in h:m:s
--------------------------------
localhostfix-layout completed0:0:0
node2fix-layout completed0:0:0
node3fix-layout completed0:0:0
node4fix-layout completed0:0:0
volume rebalance: testvol: success
2.To rebalance a volume to fix
layout and migrate the existing data(Fix Layout
and Migrate Data)
Start the rebalance operation on any one ofthe server using the following command:
# gluster volume rebalance start
For example:
# gluster volume rebalance test-volumestart
Starting rebalancing on volume test-volumehas been successful
Start the migration operation forcefully onany one of the servers using the following command:
# gluster volume rebalance start force
For example:
# gluster volume rebalance test-volumestart force
Starting rebalancing on volume test-volumehas been successful
本机操作命令:
[root@node1
gluster]#gluster volume rebalance testvol start
volume rebalance: testvol: success:Rebalance on testvol has been started successfully. Use rebalance statuscommand to check status of the rebalance process.
ID: 2a47d454-fdc3-4d95-81ac-6981577d26e9
[root@node1
gluster]#gluster volume rebalance testvol status
NodeRebalanced-filessizescannedfailuresskippedstatusrun time in h:m:s
------------------------------------------------------------------------------------------
localhost00Bytes3701completed0:00:00
node200Bytes000completed0:00:00
node300Bytes2700completed0:00:00
node400Bytes000completed0:00:00
volume rebalance: testvol: success
2.Stopping Rebalance Operation(需要的话)
You can stop the rebalance operation, asneeded.
Stop the rebalance operation using the followingcommand:
# gluster volume rebalancestop
For example:
# gluster volume rebalance test-volume stop
NodeRebalanced-filessizescannedstatus
-----------------------------------------------
617c923e-6450-4065-8e33-865e28d9428f59590244stopped
Stopped rebalance process on volumetest-volume
Stopping
Volumes
Stop the volume using the followingcommand:
# gluster volume stop
For example, to stop test-volume:
# gluster volume stop test-volume
Stopping volume will make its datainaccessible. Do you want to continue? (y/n)
Enter y to confirm the operation. Theoutput of the command displays the following:
Stopping volume test-volume has been successful
Deleting
Volumes
Delete the volume using the followingcommand:
# gluster volume delete
For example, to delete test-volume:
# gluster volume delete test-volume
Deleting volume will erase all informationabout the volume. Do you want to continue? (y/n)
Enter y to confirm the operation. Thecommand displays the following:
Deleting volume test-volume has beensuccessful
Triggering Self-Heal on Replicate
In replicate module, previously you had tomanually trigger a self-heal when a brick goes offline and comes back online,to bring all the replicas in sync. Now the pro-active self-heal daemon runs inthe background, diagnoses issues and automatically initiates self-healing every10 minutes on the files which requireshealing.
You can view the list of files that needhealing,the list of files which are currently/previouslyhealed,list of files which are in split-brain state, and you can manually triggerself-heal on the entire volume or only on the files which needhealing.
·Trigger self-heal only on the files which requireshealing:
# gluster volume heal
For example, to trigger self-heal on fileswhich requireshealingof test-volume:
# gluster volume heal test-volume
Heal operation onvolumetest-volumehas been successful
·Trigger self-heal on all the files of a volume:
# gluster volume heal full
For example, to trigger self-heal on all thefiles of of test-volume:
# gluster volume heal test-volume full
Heal operation onvolumetest-volumehas been successful
·View the list of files that needshealing:
# gluster volume heal info
For example, to view the list of files ontest-volume that needshealing:
# gluster volume heal test-volume info
Brick server1:/gfs/test-volume_0
Numberofentries:0
Brick server2:/gfs/test-volume_1
Numberofentries:101
/95.txt
/32.txt
/66.txt
/35.txt
/18.txt
/26.txt
/47.txt
/55.txt
/85.txt
...
·View the list of files that are self-healed:
# gluster volume heal info
healed
For example, to view the list of files ontest-volume that are self-healed:
# gluster volume heal test-volume info healed
BrickServer1:/gfs/test-volume_0
Numberofentries:0
BrickServer2:/gfs/test-volume_1
Numberofentries:69
/99.txt
/93.txt
/76.txt
/11.txt
/27.txt
/64.txt
/80.txt
/19.txt
/41.txt
/29.txt
/37.txt
/46.txt
...
·View the list of files of a particular volume on which theself-heal failed:
# gluster volume heal info
failed
For example, to view the list of files oftest-volume that are not self-healed:
# gluster volume heal test-volume info failed
BrickServer1:/gfs/test-volume_0
Numberofentries:0
BrickServer2:/gfs/test-volume_3
Numberofentries:72
/90.txt
/95.txt
/77.txt
/71.txt
/87.txt
/24.txt
...
·View the list of files of a particular volume which are insplit-brain state:
# gluster volume heal info split-brain
For example, to view the list of files oftest-volume which are in split-brain state:
# gluster volume heal test-volume info split-brain
BrickServer1:/gfs/test-volume_2
Numberofentries:12
/83.txt
/28.txt
/69.txt
...
BrickServer2:/gfs/test-volume_3
Numberofentries:12
/83.txt
/28.txt
/69.txt
...
问题处理:
1.提示State: Peer in Cluster (Disconnected)
[root@node1 gluster]# gluster peer status
Number of Peers: 1
Hostname: node2
Uuid: 61fe987a-99ff-419d-8018-90603ea16fe7
State: Peer in Cluster(Disconnected)
解决方式:查看防火墙状态和/etc/hosts文件,防火墙也可以通过规则放行,不过还是关掉好,性能考虑
2.重建volume出错
[root@node1
sdb1]#gluster volume start testvol
volume start: testvol: failed: Failed toget extended attribute trusted.glusterfs.volume-id for brick dir /export/sdb1/brick.Reason : No data available#故障 重建volume出错
处理方式: 查看volume的信息,删除掉volume,清空信息,重新建立volume
[root@node1
sdb1]#gluster volume info
Volume Name: testvol
Type: Replicate
Volume ID:57a60503-c5ae-4671-b213-6f2a2f913615
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: node1:/export/sdb1/brick
Brick2: node2:/export/sdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[root@node1
sdb1]#gluster volume delete testvol
Deleting volume will erase all informationabout the volume. Do you want to continue? (y/n) y
volume delete: testvol: success
[root@node1
sdb1]#setfattr -x trusted.glusterfs.volume-id/export/sdb1/brick && setfattr -x trusted.gfid /export/sdb1/brick&& rm -rf /export/sdb1/brick/..glusterfs
setfattr: /export/sdb1/brick: No suchattribute
[root@node1
sdb1]#setfattr -x trusted.glusterfs.volume-id/export/sdb1/brick && setfattr -x trusted.gfid /export/sdb1/brick&& rm -rf /export/sdb1/brick/..glusterfs
setfattr: /export/sdb1/brick: No suchattribute
[root@node1 sdb1]# gluster volume createtestvol replica 2 transport tcp node1:/export/sdb1/bricknode2:/export/sdb1/brick
volume create: testvol: success: pleasestart the volume to access data
[root@node1
sdb1]#gluster volume start testvol
volume start: testvol: success
[root@node1
sdb1]#gluster volume status
Status of volume: testvol
Gluster processTCP PortRDMA PortOnlinePid
------------------------------------------------------------------------------
Brick node1:/export/sdb1/brick491520Y2429
Brick node2:/export/sdb1/brick491520Y2211
Self-heal Daemon on localhostN/AN/AY2449
Self-heal Daemon on node2N/AN/AY2231
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
可以参照下边的两篇文章:
glusterfs volume create: testvol: failed: /data/brick1 ora prefix of it is already part of a volume
在创建volume的时候提示如下报错
[root@gluster-node1 ~]# gluster volumecreate testvol 192.168.11.139:/data/brick1 192.168.11.140:/data/brick2forcevolume create: testvol: failed: /data/brick1 or a prefix of it is alreadypart of a volume
找到了国外的一篇博客,据这篇博客讲。从glusterfs3.3开始,有一个新的变化就是会检测这个目录是否已经加入了volume。这就导致了很多gluster支持的问题。
假如你移除了一个volume的brick并且继续使用这个volume,然后重新添加之前的这个brick,这时你的文件状态就会改变。就会造成一系列的问题。它们中的大多数会引起数据丢失。
假如你重新使用一个brick的话,请确保你知道自己在做什么
解决办法就是:
setfattr -x trusted.glusterfs.volume-id $brick_path
setfattr -x trusted.gfid $brick_path
rm -rf $brick_path/.glusterfs
1
2
3
4
5
6
7[root@gluster-node1 data]# setfattr -x trusted.glusterfs.volume-id /data/ctdb/
[root@gluster-node1 data]# setfattr -x trusted.gfid /data/ctdb/
[root@gluster-node1 data]# rm -rf /data/ctdb/.
./ ../ .glusterfs/
[root@gluster-node1 data]# rm -rf /data/ctdb/.glusterfs
[root@gluster-node1 data]# service glusterd restart
Starting glusterd: [确定]
不要担心提示说 属性不存在,只要它不存在,那它就处于健康的状态
最后,重启一下glusterfs来确保它并没有“记起来”旧的bricks
有些可能翻译的不到位,可以查看原文
https://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-part-of-a-volume/
error
)
GlusterFS: {path} or a prefix of it is already part of avolume
Starting
with GlusterFS 3.3, one change has been the check to see if a directory (or any
of it's ancestors) is already part of a volume. This is causing many support
questions in#gluster.
Thiswas implemented because if you remove a brick from a volume and continue to usethe volume, you can get file into a state where re-adding a former brick cancause all sort of problems, many of which can result in data loss.
Ifyou're going to reuse a brick, make sure you know what you're doing.
The Solution
Forthe directory (or any parent directories) that was formerly part of a volume, simply:
setfattr -x trusted.glusterfs.volume-id $brick_path
setfattr -x trusted.gfid $brick_path
rm -rf $brick_path/.glusterfs
Don'tworry if it says that the attribute does not exist. As long as it doesn'texist, you're in good shape.
Finally,restart glusterd to ensure it's not "remembering" the old bricks.
See
thebugzilla entryfor more details and see Jeff Darcy's article for more
information about how GlusterFS usesextended attributes.
[python]view plaincopy
1.[root@test glusterfs]# gluster volume create hello replica 3 test.144:/data0/glusterfs test.145:/data0/glusterfs test.146:/data0/glusterfs
2.volume create: hello: success: please start the volume to access data
3.[root@test glusterfs]# gluster volume delete hello
4.Deleting volume will erase all information about the volume. Do you want tocontinue? (y/n) y
5.volume delete: hello: success
6.[root@test glusterfs]# gluster volume create gfs replica 3 test.144:/data0/glusterfs test.145:/data0/glusterfs test.146:/data0/glusterfs
7.volume create: gfs: failed: Staging failed on test.144.Error: /data0/glusterfsisalready part of a volume
8.Staging failed on test.146.Error: /data0/glusterfsisalready part of a volume
9.Staging failed on test.145.Error: /data0/glusterfsisalready part of a volume
10.[root@test glusterfs]# ssh test.144 'setfattr -x trusted.glusterfs.volume-id /data0/glusterfs/ && setfattr -x trusted.gfid /data0/glusterfs/ && rm -rf /data0/glusterfs/..glusterfs'
11.setfattr: /data0/glusterfs/: No such attribute
12.[root@test glusterfs]# ssh test.145 'setfattr -x trusted.glusterfs.volume-id /data0/glusterfs/ && setfattr -x trusted.gfid /data0/glusterfs/ && rm -rf /data0/glusterfs/..glusterfs'
13.setfattr: /data0/glusterfs/: No such attribute
14.[root@test glusterfs]# ssh test.146 'setfattr -x trusted.glusterfs.volume-id /data0/glusterfs/ && setfattr -x trusted.gfid /data0/glusterfs/ && rm -rf /data0/glusterfs/..glusterfs'
15.setfattr: /data0/glusterfs/: No such attribute
16.[root@test glusterfs]#
17.[root@test glusterfs]# gluster volume create gfs replica 3 test.144:/data0/glusterfs test.145:/data0/glusterfs test.146:/data0/glusterfs
18.volume create: gfs: success: please start the volume to access data
19.[root@test glusterfs]#
查看node2防火墙规则是否适当,网络是否通顺
3.之前brick在集群里边,后来踢出来,再加报错处理
[root@node1
brick]#gluster volume add-brick testvolnode3:/export/sdb1/brick node4:/export/sdb1/brick node3:/export/sdc1/bricknode4:/export/sdc1/brick
volume add-brick: failed: Pre Validationfailed on node3. /export/sdb1/brick is already part of a volume
Pre Validation failed on node4./export/sdb1/brick is already part of a volume
[root@node1
brick]#gluster volume add-brick testvolnode3:/export/sdb1/brick node4:/export/sdb1/brick node3:/export/sdc1/bricknode4:/export/sdc1/brick
volume add-brick: success
官方安装手册
Step 1 – Have atleast two nodes
·Fedora 22 (or later) on two nodes named"server1" and "server2"
·A working network connection
·At least two virtual disks, one for the OS installation,and one to be used to serve GlusterFS storage (sdb). This will emulate a realworld deployment, where you would want to separate GlusterFS storage from theOS install.
·Note: GlusterFS stores its dynamically generatedconfiguration files at /var/lib/glusterd. If at any point in time GlusterFS isunable to write to these files (for example, when the backing filesystem isfull), it will at minimum cause erratic behavior for your system; or worse,take your system offline completely. It is advisable to create separatepartitions for directories such as /var/log to ensure this does not happen.
Step 2 - Formatand mount the bricks
(on both nodes): Note: These examplesare going to assume the brick is going to reside on /dev/sdb1.
mkfs.xfs -isize=512/dev/sdb1
mkdir -p/data/brick1
echo'/dev/sdb1 /data/brick1 xfs defaults 1 2'>> /etc/fstab
mount -a&& mount
You should now see sdb1 mounted at/data/brick1
Step 3 -Installing GlusterFS
(on both servers) Install the software
yuminstallglusterfs-server
Start the GlusterFS management daemon:
service glusterd start
service glusterd status
glusterd.service -LSB:glusterfs server
Loaded: loaded (/etc/rc.d/init.d/glusterd)
Active: active (running) since Mon,13Aug201213:02:11-0700;2s ago
Process:19254ExecStart=/etc/rc.d/init.d/glusterd start (code=exited, status=0/SUCCESS)
CGroup: name=systemd:/system/glusterd.service
├19260/usr/sbin/glusterd -p /run/glusterd.pid
├19304/usr/sbin/glusterfsd --xlator-option georep-server.listen-port=24009-s localhost...
└19309/usr/sbin/glusterfs -f /var/lib/glusterd/nfs/nfs-server.vol -p/var/lib/glusterd/...
Step 4 -Configure the trusted pool
From "server1"
glusterpeer probe server2
Note: When using hostnames, the firstserver needs to be probed fromoneother server to setits hostname.
From "server2"
glusterpeer probe server1
Note: Once this pool has beenestablished, only trusted members may probe new servers into the pool. A newserver cannot probe the pool, it must be probed from the pool.
Step 5 - Set upa GlusterFS volume
On both server1 and server2:
mkdir -p /data/brick1/gv0
From any single server:
gluster volumecreategv0 replica2server1:/data/brick1/gv0 server2:/data/brick1/gv0
gluster volumestartgv0
Confirm that the volume shows"Started":
glustervolumeinfo
Note: If the volume is not started,clues as to what went wrong will be in log files under /var/log/glusterfs onone or both of the servers - usually in etc-glusterfs-glusterd.vol.log
Step 6 - Testingthe GlusterFS volume
For this step, we will use one of theservers to mount the volume. Typically, you would do this from an externalmachine, known as a "client". Since using this method would requireadditional packages to be installed on the client machine, we will use one ofthe servers as a simple place to test first, as if it were that"client".
mount -t glusterfs server1:/gv0 /mnt
foriin`seq -w 1 100`;docp -rp /var/log/messages /mnt/copy-test-$i; done
First, check the mount point:
ls-lA/mnt | wc -l
You should see 100 files returned. Next,check the GlusterFS mount points on each server:
ls -lA /data/brick1/gv0
You should see 100 files on each serverusing the method we listed here. Without replication, in a distribute onlyvolume (not detailed here), you should see about 50 files on each one.
1.Configure Firewall(最好关闭防火墙)
For the Gluster to communicate within a cluster either the firewallshave to be turned off or enable communication for each server.
iptables -IINPUT -p all -s `` -j ACCEPT
2. Configurethe trusted pool
Remember that the trusted pool is the term used to define a cluster
of nodes in Gluster. Choose a server to be your “primary” server. This is just
to keep things simple, you will generally want to run all commands from this
tutorial. Keep in mind, running many Gluster specific commands (like gluster
volume create) on one server in the cluster will execute the same command on
all other servers.(只需要在一台机器上执行)
3. Replace
nodename with hostname of the other server in the cluster, or IP address if you
don’t have DNS or /etc/hosts entries. Let say we want to connect to node02:(在DNS和/etc/hosts中都要设置)
gluster peerprobe node02
Notice that running gluster peer statusfrom the second node shows that the first node has already been added.
4. Partition
the disk(磁盘处理)
4.1Assuming you have a emptydisk at /dev/sdb:
fdisk /dev/sdb
4.2 And then create a single XFS partition using fdisk
Format thepartition
mkfs.xfs -i size=512 /dev/sdb1
4.3 Add an entry to /etc/fstab
echo "/dev/sdb1/export/sdb1 xfs defaults 0 0">> /etc/fstab
4.4Mount the partition as aGluster "brick"
mkdir -p /export/sdb1 && mount -a&& mkdir -p /export/sdb1/brick
Set up a Gluster volume
The most basic Gluster volume type is a“Distribute only” volume (also referred to as a “pure DHT” volume if you wantto impress the folks at the water cooler). This type of volume simplydistributes the data evenly across the available bricks in a volume. So, if Iwrite 100 files, on average, fifty will end up on one server, and fifty willend up on another. This is faster than a “replicated” volume, but isn’t aspopular since it doesn’t give you two of the most sought after features ofGluster — multiple copies of the data, and automatic failover if something goeswrong.
1.To set up a replicated volume:(设置复制卷)
gluster volumecreate gv0 replica 2 node01.mydomain.net:/export/sdb1/bricknode02.mydomain.net:/export/sdb1/brick
Breakingthis down into pieces:
the first part says to create a glustervolume named gv0 (the name is arbitrary, gv0 was chosen simply because it’sless typing than gluster_volume_0).
make the volume a replica volume
keep a copy of the data on at least 2bricks at any given time. Since we only have two bricks total, this means eachserver will house a copy of the data.
we specify which nodes to use, and whichbricks on those nodes. The order here is important when you have more bricks.
It is possible (as of the most currentrelease as of this writing, Gluster 3.3) to specify the bricks in a such a waythat you would make both copies of the data reside on a single node. This wouldmake for an embarrassing explanation to your boss when your bulletproof,completely redundant, always on super cluster comes to a grinding halt when asingle point of failure occurs.
2.Now, we can check to make sure things are working as expected:
# gluster volume info
And you should see results similar to thefollowing:
Volume Name: gv0
Type: Replicate
Volume ID: 8bc3e96b-a1b6-457d-8f7a-a91d1d4dc019
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: node01.yourdomain.net:/export/sdb1/brick
Brick2: node02.yourdomain.net:/export/sdb1/brick
This shows us essentially what we justspecified during the volume creation. The one this to mention is the Status. Astatus of Created means that the volume has been created, but hasn’t yet beenstarted, which would cause any attempt to mount the volume fail.
3.Now, we should start the volume.
# gluster volumestart gv0
其他资料:
支持的数据类型:
Gluster does not support so called“structured data”, meaning live, SQL databases. Of course, using Gluster tobackup and restore the database would be fine - Gluster is traditionally betterwhen usingfile sizes at of least 16KB(with a sweet spot around 128KB or so).
结构化数据不支持,但是可以用来备份和重新存储数据
硬件要求(是否可以是不同的硬件)
If you want to test on bare metal, sinceGluster is built with commodity hardware in mind, and because there is nocentralized meta-data server, a very simple cluster can be deployed with twobasic servers (2 CPU’s, 4GB of RAM each, 1 Gigabit network). This is sufficientto have a nice file share or a place to put some nightly backups. Gluster isdeployed successfully on all kinds of disks, from the lowliest 5200 RPM SATA tomightiest 1.21 gigawatt SSD’s. The more performance you need, the moreconsideration you will want to put into how much hardware to buy, but the greatthing about Gluster is that you can start small, and add on as your needs grow.
是否需要同样配置的主机
OK, but if I addservers on later, don’t they have to be exactly the same?
In a perfect world, sure. Having thehardware be the same means less troubleshooting when the fires start poppingup. But plenty of people deploy Gluster on mix and match hardware, andsuccessfully.
不需要
文件系统格式的要求建议XFS格式
Typically, XFS is recommended but it can beused with other filesystems as well. Most commonly EXT4 is used when XFS isn’t,but you can (and many, many people do) use another filesystem that suits you.Now that we understand that, we can define a few of the common terms used inGluster.
注意点:
we can define a few of the common termsused in Gluster.
·Atrusted poolrefers collectively tothe hosts in a given Gluster Cluster.
·Anodeor “server” refers to any serverthat is part of a trusted pool. In general, this assumes all nodes are in thesame trusted pool.
·Abrickis used to refer to any device(really this means filesystem) that is being used for Gluster storage.
·Anexportrefers to the mount path ofthe brick(s) on a given server, for example, /export/brick1
·The termGlobal Namespaceis a fancy wayof saying a Gluster volume
·AGluster volumeis a collection of oneor more bricks (of course, typically this is two or more). This is analogous to/etc/exports entries for NFS.
·GNFSandkNFS. GNFS is how we refer to ourinline NFS server. kNFS stands for kernel NFS, or, as most people would say,just plain NFS. Most often, you will want kNFS services disabled on the Glusternodes. Gluster NFS doesn't take any additional configuration and works justlike you would expect with NFSv3. It is possible to configure Gluster and NFSto live in harmony if you want to.
Other notes:
·For this test, if you do not have DNS set up, you can getaway with using /etc/hosts entries for the two nodes. However, when you movefrom this basic setup to using Gluster in production, correct DNS entries(forward and reverse) and NTP are essential.
·When you install the Operating System, do not format theGluster storage disks! We will use specific settings with the mkfs commandlater on when we set up Gluster. If you are testing with a single disk (notrecommended), make sure to carve out a free partition or two to be used byGluster later, so that you can format or reformat at will during your testing.
·Firewalls are great, except when they aren’t. For storage
servers, being able to operate in a trusted environment without firewalls can
mean huge gains in performance, and is recommended. In case you absolutely need
to set up a firewall, have a look atSetting up clientsfor information on the ports used
硬件要求(最小需求)
64位系统,1个cpu,1GRAM,8G存储
You will need to have at least two nodes witha 64 bit OS and a working network connection. At least one gig of RAM is thebare minimum recommended for testing, and you will want at least 8GB in anysystem you plan on doing any real work on. A single cpu is fine for testing, aslong as it is 64 bit.
####################################################################################
2.GlusterFS 安装配置
2.1 GlusterFS 安装前的准备
服务器规划:(vmware 实验)
操作系统 | IP | 主机名 | 数据盘(2 块) |
---|---|---|---|
CentOS 6.8 x86_64 | 10.1.0.151 | mystorage1 | sdb:10G sdc:10G |
CentOS 6.8 x86_64 | 10.1.0.152 | mystorage2 | sdb:10G sdc:10G |
CentOS 6.8 x86_64 | 10.1.0.153 | mystorage3 | sdb:10G sdc:10G |
CentOS 6.8 x86_64 | 10.1.0.154 | mystorage4 | sdb:10G sdc:10G |
2.2 GlusterFS 安装
2.2.1 修改主机名
# vim /etc/sysconfig/network
执行 hostname 主机名 ;
主机名修改完毕
2.2.2 添加 hosts 文件实现集群主机之间相互能够解析
# vim /etc/hosts 10.1.0.151 mystorage1 10.1.0.152 mystorage2 10.1.0.153 mystorage3 10.1.0.154 mystorage4
2.2.3 关闭 SELinux 和 防火墙
# sed -i 's#SELINUX=enforcing#SELINUX=disabled#' /etc/selinux/config # chkconfig iptables off # reboot
2.2.4 安装 EPEL 源
GlusterFS yum 源有部分包依赖 epel 源
# 移除/etc/yum.repos.d 中的原由yum源,更改为aliyun的源;
# yum install epel-release -y
2.2.5 安装 GlusterFS 源及相关软件包
# yum install centos-release-gluster37.noarch -y # yum --enablerepo=centos-gluster*-test install glusterfs-server glusterfs-cli glusterfs-geo-replication -y # 安装完成后的包 rpm -qa | grep gluster* centos-release-gluster37-1.0-4.el6.centos.noarch glusterfs-api-3.7.13-1.el6.x86_64 glusterfs-3.7.13-1.el6.x86_64 glusterfs-client-xlators-3.7.13-1.el6.x86_64 glusterfs-fuse-3.7.13-1.el6.x86_64 glusterfs-server-3.7.13-1.el6.x86_64 glusterfs-libs-3.7.13-1.el6.x86_64 glusterfs-cli-3.7.13-1.el6.x86_64 glusterfs-geo-replication-3.7.13-1.el6.x86_64
2.3配置 GlusterFS
2.3.1 查看 GlusterFS 版本信息
使用 glusterfs -V
命令
[root@mystorage1 ~]# glusterfs -V glusterfs 3.7.20 built on Jan 30 2017 15:39:27
2.3.2启动、停止服务
# 四台虚机上都执行 # /etc/init.d/glusterd start # /etc/init.d/glusterd status # chkconfig glusterd on
2.3.3 存储主机加入信任主机池
在一台主机上执行,将其他主机加入,如下是在 mystorage1 上执行
[root@mystorage1 ~]# gluster peer probe mystorage2 peer probe: success. [root@mystorage1 ~]# gluster peer probe mystorage3 peer probe: success. [root@mystorage1 ~]# gluster peer probe mystorage4 peer probe: success.
2.3.4查看状态
在其他机器查看状态:
[root@mystorage2 ~]# gluster peer status Number of Peers: 3 Hostname: mystorage1 Uuid: 6e6a84af-ac7a-44eb-85c9-50f1f46acef1 State: Peer in Cluster (Connected) Hostname: mystorage3 Uuid: 36e4c45c-466f-47b0-b829-dcd4a69ca2e7 State: Peer in Cluster (Connected) Hostname: mystorage4 Uuid: c607f6c2-bdcb-4768-bc82-4bc2243b1b7a State: Peer in Cluster (Connected)
2.3.5 配置前的准备工作
安装 xfs 支持包(ext4文件格式支持16TB的磁盘大小,而xfs是PB级别的)
# 四台都执行
# yum install xfsprogs -y
fdisk -l
查看磁盘设备
# fdisk /dev/sdb
n
p
1
w
** 线上是可以不做这一步的;
# 解释说明:
如果磁盘大于 2T 的话就用 parted 来分区,这里我们不用分区(可以不分区);
做分布式文件系统的时候数据盘一般不需要做 RAID,一般系统盘会做 RAID 1;
如果有raid卡的话,最好用上,raid卡有数据缓存功能,也能提高磁盘的iops,最好的话,用RAID 5;
如果都不做raid的话,也是没问题的,glusterfs也是可以保证数据的安全的。
格式化创建文件系统
# 四台都执行
# mkfs.xfs -f /dev/sdb
在四台机器上创建挂载块设备的目录,挂载硬盘到目录:
# 四台都执行 # mkdir -p /storage/brick{1..2} # mount /dev/sdb /storage/brick1 # df -h
# 加入开机自动挂载
# echo "/dev/sdb /storage/brick1 xfs defaults 0 0" >>/etc/fstab
# mount -a
2.3.6 创建 volume 及其他操作
GlusterFS 五种卷
- Distributed:分布式卷,文件通过 hash 算法随机分布到由 bricks 组成的卷上。
- Replicated: 复制式卷,类似 RAID 1,replica 数必须等于 volume 中 brick 所包含的存储服务器数,可用性高。
- Striped: 条带式卷,类似 RAID 0,stripe 数必须等于 volume 中 brick 所包含的存储服务器数,文件被分成数据块,以 Round Robin 的方式存储在 bricks 中,并发粒度是数据块,大文件性能好。
- Distributed Striped: 分布式的条带卷,volume中 brick 所包含的存储服务器数必须是 stripe 的倍数(>=2倍),兼顾分布式和条带式的功能。
- Distributed Replicated: 分布式的复制卷,volume 中 brick 所包含的存储服务器数必须是 replica 的倍数(>=2倍),兼顾分布式和复制式的功能。
分布式复制卷的brick顺序决定了文件分布的位置,一般来说,先是两个brick形成一个复制关系,然后两个复制关系形成分布。
glustfs 最常用的卷就是分布式复制卷。
striped 的目的就提高性能,读取更快。
企业一般用后两种,大部分会用分布式复制(可用容量为 总容量/复制份数),通过网络传输的话最好用万兆交换机,万兆网卡来做。这样就会优化一部分性能。它们的数据都是通过网络来传输的。
1)分布式卷
# 创建分布式卷 [root@mystorage1 ~]# gluster volume create gv1 mystorage1:/storage/brick1 mystorage2:/storage/brick1 force volume create: gv1: success: please start the volume to access data # 启动创建的卷 [root@mystorage1 ~]# gluster volume start gv1 volume start: gv1: success # 在另一台机器(mystorage4)查看卷信息 [root@mystorage4 ~]# gluster volume info Volume Name: gv1 Type: Distribute Volume ID: b6ec2f8a-d1f0-4d1b-806b-238efb6dcb84 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: mystorage1:/storage/brick1 Brick2: mystorage2:/storage/brick1 Options Reconfigured: performance.readdir-ahead: on # 挂载卷到目录 [root@mystorage4 ~]# mount -t glusterfs 127.0.0.1:/gv1 /mnt [root@mystorage4 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 33G 1.3G 30G 5% / tmpfs 242M 0 242M 0% /dev/shm /dev/sda1 976M 38M 888M 5% /boot /dev/sdb 10G 33M 10G 1% /storage/brick1 127.0.0.1:/gv1 20G 65M 20G 1% /mnt # 在 mystorage1 创建测试文件 [root@mystorage1 ~]# touch /mnt/{a..d} [root@mystorage1 ~]# ll /mnt total 0 -rw-r--r-- 1 root root 0 Jul 30 00:54 a -rw-r--r-- 1 root root 0 Jul 30 00:54 b -rw-r--r-- 1 root root 0 Jul 30 00:54 c -rw-r--r-- 1 root root 0 Jul 30 00:54 d # 在 mystorage4 也可看到新创建的文件,信任存储池中的每一台主机挂载这个卷后都可以看到 [root@mystorage4 ~]# ll /mnt/ total 0 -rw-r--r-- 1 root root 0 Jul 30 00:54 a -rw-r--r-- 1 root root 0 Jul 30 00:54 b -rw-r--r-- 1 root root 0 Jul 30 00:54 c -rw-r--r-- 1 root root 0 Jul 30 00:54 d # 文件实际存在位置 [root@mystorage1 ~]# ls /storage/brick1 a b c e [root@mystorage2 ~]# ls /storage/brick1 d # 上面可以看到文件根据 hash 算法随机分布到由不同的 brick 上
使用 NFS 方式挂载
[root@mystorage3 ~]# mount -o mountproto=tcp -t nfs mystorage1:/gv1 /mnt/ [root@mystorage3 ~]# ll /mnt total 0 -rw-r--r-- 1 root root 0 Jul 30 00:54 a -rw-r--r-- 1 root root 0 Jul 30 00:54 b -rw-r--r-- 1 root root 0 Jul 30 00:54 c -rw-r--r-- 1 root root 0 Jul 30 00:54 d [root@mystorage2 ~]# mount -o mountproto=tcp -t nfs 192.168.56.13:/gv1 /mnt/ # host 可以写 IP,可以看到这个 mystorage3 的 IP,说明 gv1 是共享给信任存储池的所有主机的 [root@mystorage2 ~]# ll /mnt/ total 0 -rw-r--r-- 1 root root 0 Jul 30 00:54 a -rw-r--r-- 1 root root 0 Jul 30 00:54 b -rw-r--r-- 1 root root 0 Jul 30 00:54 c -rw-r--r-- 1 root root 0 Jul 30 00:54 d
2)复制式卷
# 创建复制式卷 [root@mystorage1 ~]# gluster volume create gv2 replica 2 mystorage3:/storage/brick1 mystorage4:/storage/brick1 force volume create: gv2: success: please start the volume to access data # 启动创建的卷 [root@mystorage1 ~]# gluster volume start gv2 volume start: gv2: success # 查看卷信息 [root@mystorage1 ~]# gluster volume info gv2 Volume Name: gv2 Type: Replicate Volume ID: 11928696-263a-4c7a-a155-5115af29221f Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mystorage3:/storage/brick1 Brick2: mystorage4:/storage/brick1 Options Reconfigured: performance.readdir-ahead: on # 挂载卷到目录,创建测试文件 [root@mystorage1 ~]# mount -t glusterfs 127.0.0.1:/gv2 /opt [root@mystorage1 ~]# touch /opt/{a..d} [root@mystorage1 ~]# ls /opt a b c d # 在 mystorage3,4 可看到新创建的文件 [root@mystorage3 ~]# mount -t glusterfs 127.0.0.1:/gv2 /opt [root@mystorage3 ~]# ls /opt/ a b c d [root@mystorage4 ~]# mount -t glusterfs 127.0.0.1:/gv2 /opt [root@mystorage4 ~]# ls /opt/ a b c d # 文件实际存在位置 [root@mystorage3 ~]# ls /storage/brick1 a b c d [root@mystorage4 ~]# ls /storage/brick1 a b c d # 上面可以看到文件根据在 2 台机器上的 brick 上都有
格式化挂载第二块硬盘
# mkfs.xfs -f /dev/sdc # mkdir -p /storage/brick2 # echo "/dev/sdc /storage/brick2 xfs defaults 0 0" >> /etc/fstab # mount -a # df -h
3)分布式条带卷
# 创建分布式条带卷 [root@mystorage1 ~]# gluster volume create gv3 stripe 2 mystorage3:/storage/brick2 mystorage4:/storage/brick2 force volume create: gv3: success: please start the volume to access data # 启动创建的卷 [root@mystorage1 ~]# gluster volume start gv3 volume start: gv3: success # 查看卷信息 [root@mystorage1 ~]# gluster volume info gv3 Volume Name: gv3 Type: Stripe Volume ID: 2871801f-b125-465c-be3a-4eeb2fb44916 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mystorage3:/storage/brick2 Brick2: mystorage4:/storage/brick2 Options Reconfigured: performance.readdir-ahead: on # 挂载卷到目录,创建测试文件 mkdir /gv1 /gv2 /gv3 mount -t glusterfs 127.0.0.1:gv1 /gv1 mount -t glusterfs 127.0.0.1:gv2 /gv2 mount -t glusterfs 127.0.0.1:gv3 /gv3 df -h dd if=/dev/zero bs=1024 count=10000 of=/gv3/10M.file dd if=/dev/zero bs=1024 count=20000 of=/gv3/20M.file # 查看新创建的文件 [root@mystorage1 ~]# ll /gv3/ total 30000 -rw-r--r-- 1 root root 10240000 Jul 30 02:26 10M.file -rw-r--r-- 1 root root 20480000 Jul 30 02:26 20M.file # 文件实际存放位置 [root@mystorage3 ~]# ll -h /storage/brick2/ total 15M -rw-r--r-- 2 root root 4.9M Jul 30 02:26 10M.file -rw-r--r-- 2 root root 9.8M Jul 30 02:26 20M.file [root@mystorage4 ~]# ll -h /storage/brick2/ total 15M -rw-r--r-- 2 root root 4.9M Jul 30 02:25 10M.file -rw-r--r-- 2 root root 9.8M Jul 30 02:26 20M.file # 上面可以看到 10M 20M 的文件分别分成了 2 块(这是条带的特点),每块又分别在同的 brick 下(这是分布式的特点)
4)分布式复制卷
# 查看复制式卷的效果 cd /gv2 rm -f * dd if=/dev/zero bs=1024 count=10000 of=/gv2/10M.file dd if=/dev/zero bs=1024 count=20000 of=/gv2/20M.file dd if=/dev/zero bs=1024 count=30000 of=/gv2/30M.file [root@mystorage3 ~]# ll -h /storage/brick1/ total 59M -rw-r--r-- 2 root root 9.8M Jul 30 02:41 10M.file -rw-r--r-- 2 root root 20M Jul 30 02:41 20M.file -rw-r--r-- 2 root root 30M Jul 30 02:41 30M.file [root@mystorage4 ~]# ll -h /storage/brick1 total 59M -rw-r--r-- 2 root root 9.8M Jul 30 02:40 10M.file -rw-r--r-- 2 root root 20M Jul 30 02:40 20M.file -rw-r--r-- 2 root root 30M Jul 30 02:40 30M.file # gv2 添加 brick 进行扩容 [root@mystorage1 ~]# gluster volume stop gv2 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y [root@mystorage1 ~]# gluster volume add-brick gv2 replica 2 mystorage1:/storage/brick2 mystorage2:/storage/brick2 force volume add-brick: success [root@mystorage1 ~]# gluster volume start gv2 volume start: gv2: success [root@mystorage1 ~]# gluster volume info gv2 Volume Name: gv2 Type: Distributed-Replicate # 这里显示是分布式复制卷,是在 gv2 复制卷的基础上增加 2 块 brick 形成的 Volume ID: 11928696-263a-4c7a-a155-5115af29221f Status: Stopped Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: mystorage3:/storage/brick1 Brick2: mystorage4:/storage/brick1 Brick3: mystorage1:/storage/brick2 Brick4: mystorage2:/storage/brick2 Options Reconfigured: performance.readdir-ahead: on
注意:当你给分布式复制卷和分布式条带卷增加 bricks 时,你增加的 bricks 数目必须是复制或条带数目的倍数,例如:你给一个分布式复制卷的 replica 为 2,你在增加 bricks 的时候数量必须为2、4、6、8等。 扩容后进行测试,发现文件都分布在扩容前的卷中。
分布式复制卷的最佳实践:
1)搭建条件
- 块服务器的数量必须是复制的倍数
- 将按块服务器的排列顺序指定相邻的块服务器成为彼此的复制
例如,8台服务器:
- 当复制副本为2时,按照服务器列表的顺序,服务器1和2作为一个复制,3和4作为一个复制,5和6作为一个复制,7和8作为一个复制
- 当复制副本为4时,按照服务器列表的顺序,服务器1/2/3/4作为一个复制,5/6/7/8作为一个复制
2)创建分布式复制卷
# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data
参考博客:
http://cmdschool.blog.51cto.com/2420395/1828450
磁盘存储的平衡
注意:平衡布局是很有必要的,因为布局结构是静态的,当新的 bricks 加入现有卷,新创建的文件会分布到旧的 bricks 中,所以需要平衡布局结构,使新加入的 bricks 生效。布局平衡只是使新布局生效,并不会在新的布局中移动老的数据,如果你想在新布局生效后,重新平衡卷中的数据,还需要对卷中的数据进行平衡。
# 再在 /gv2 下创建 2 个新的文件 10M.file1 20M.file1 [root@mystorage1 ~]# dd if=/dev/zero bs=1024 count=10000 of=/gv2/10M.file1 [root@mystorage1 ~]# dd if=/dev/zero bs=1024 count=20000 of=/gv2/20M.file1 [root@mystorage1 ~]# ll -rht /gv2/ total 88M -rw-r--r-- 1 root root 9.8M Jul 30 02:40 10M.file -rw-r--r-- 1 root root 20M Jul 30 02:40 20M.file -rw-r--r-- 1 root root 30M Jul 30 02:40 30M.file -rw-r--r-- 1 root root 9.8M Jul 30 03:10 10M.file1 -rw-r--r-- 1 root root 20M Jul 30 03:10 20M.file1 [root@mystorage1 ~]# ll /storage/brick2 total 0 [root@mystorage2 ~]# ll /storage/brick2 total 0 [root@mystorage3 ~]# ll -hrt /storage/brick1 total 88M -rw-r--r-- 2 root root 9.8M Jul 30 02:41 10M.file -rw-r--r-- 2 root root 20M Jul 30 02:41 20M.file -rw-r--r-- 2 root root 30M Jul 30 02:41 30M.file -rw-r--r-- 2 root root 9.8M Jul 30 03:12 10M.file1 -rw-r--r-- 2 root root 20M Jul 30 03:13 20M.file1 [root@mystorage4 ~]# ll -hrt /storage/brick1 total 88M -rw-r--r-- 2 root root 9.8M Jul 30 02:40 10M.file -rw-r--r-- 2 root root 20M Jul 30 02:40 20M.file -rw-r--r-- 2 root root 30M Jul 30 02:40 30M.file -rw-r--r-- 2 root root 9.8M Jul 30 03:10 10M.file1 -rw-r--r-- 2 root root 20M Jul 30 03:10 20M.file1 # 从上面可以看到,新创建的文件还是在之前的 bricks 中,并没有分布中新加的 bricks 中 # 下面进行磁盘存储平衡 [root@mystorage1 ~]# gluster volume rebalance gv2 start volume rebalance: gv2: success: Rebalance on gv2 has been started successfully. Use rebalance status command to check status of the rebalance process. ID: e23213be-7771-4a2b-87b4-259fd048ec46 [root@mystorage1 ~]# gluster volume rebalance gv2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 completed 0:0:1 mystorage2 0 0Bytes 0 0 0 completed 0:0:0 mystorage3 2 39.1MB 5 0 0 completed 0:0:2 mystorage4 0 0Bytes 0 0 0 completed 0:0:1 volume rebalance: gv2: success # 查看磁盘存储平衡后文件在 bricks 中的分布情况 [root@mystorage1 ~]# ll /storage/brick2 total 40000 -rw-r--r-- 2 root root 20480000 Jul 30 02:41 20M.file -rw-r--r-- 2 root root 20480000 Jul 30 03:13 20M.file1 [root@mystorage2 ~]# ll /storage/brick2 total 40000 -rw-r--r-- 2 root root 20480000 Jul 30 02:41 20M.file -rw-r--r-- 2 root root 20480000 Jul 30 03:13 20M.file1 [root@mystorage3 ~]# ll -hrt /storage/brick1 total 49M -rw-r--r-- 2 root root 9.8M Jul 30 02:41 10M.file -rw-r--r-- 2 root root 30M Jul 30 02:41 30M.file -rw-r--r-- 2 root root 9.8M Jul 30 03:12 10M.file1 [root@mystorage4 ~]# ll -hrt /storage/brick1 total 49M -rw-r--r-- 2 root root 9.8M Jul 30 02:40 10M.file -rw-r--r-- 2 root root 30M Jul 30 02:40 30M.file -rw-r--r-- 2 root root 9.8M Jul 30 03:10 10M.file1 # 从上面可以看到 20M.file 20M.file1 2 个文件 平衡到 新加的 2 个 brick 中了
每做一次扩容后都需要做一次磁盘平衡。 磁盘平衡是在万不得已的情况下再做的,一般再创建一个卷就可以了。
移除 brick
你可能想在线缩小卷的大小,例如:当硬件损坏或网络故障的时候,你可能想在卷中移除相关的 bricks。
注意:当你移除 bricks 的时候,你在 gluster 的挂载点将不能继续访问数据,只有配置文件中的信息移除后你才能继续访问 bricks 中的数据。当移除分布式复制卷或者分布式条带卷的时候,移除的 bricks 数目必须是 replica 或者 stripe 的倍数。
例如:一个分布式条带卷的 stripe 是 2,当你移除 bricks 的时候必须是 2、4、6、8 等。
# 下面移除 gv2 卷的 2 个 bricks [root@mystorage1 ~]# gluster volume stop gv2 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: gv2: success [root@mystorage1 ~]# gluster volume remove-brick gv2 replica 2 mystorage3:/storage/brick1 mystorage4:/storage/brick1 force Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit force: success [root@mystorage1 ~]# gluster volume start gv2 volume start: gv2: success [root@mystorage1 ~]# ll /gv2/ total 40000 -rw-r--r-- 1 root root 20480000 Jul 30 02:41 20M.file -rw-r--r-- 1 root root 20480000 Jul 30 03:13 20M.file1 # 如果误操作删除了后,其实文件还在 /storage/brick1 里面的,加回来就可以了 [root@mystorage1 ~]# gluster volume stop gv2 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: gv2: success [root@mystorage1 ~]# gluster volume add-brick gv2 replica 2 mystorage3:/storage/brick1 mystorage4:/storage/brick1 force volume add-brick: success [root@mystorage1 ~]# gluster volume info gv2 Volume Name: gv2 Type: Distributed-Replicate Volume ID: 11928696-263a-4c7a-a155-5115af29221f Status: Stopped Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: mystorage1:/storage/brick2 Brick2: mystorage2:/storage/brick2 Brick3: mystorage3:/storage/brick1 Brick4: mystorage4:/storage/brick1 Options Reconfigured: performance.readdir-ahead: on [root@mystorage1 ~]# gluster volume start gv2 volume start: gv2: success [root@mystorage1 ~]# ll /gv2/ # 文件还在 total 90000 -rw-r--r-- 1 root root 10240000 Jul 30 02:40 10M.file -rw-r--r-- 1 root root 10240000 Jul 30 03:10 10M.file1 -rw-r--r-- 1 root root 20480000 Jul 30 02:41 20M.file -rw-r--r-- 1 root root 20480000 Jul 30 03:13 20M.file1 -rw-r--r-- 1 root root 30720000 Jul 30 02:40 30M.file
删除卷
一般会用在命名不规范的时候才会删除
[root@mystorage1 ~]# umount /gv1 [root@mystorage1 ~]# gluster volume stop gv1 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: gv1: success [root@mystorage1 ~]# gluster volume delete gv1 Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y volume delete: gv1: success [root@mystorage1 ~]# gluster volume info gv1 Volume gv1 does not exist
VMware WorkStation在线加硬盘:
# echo "- - -" > /sys/class/scsi_host/host2/scan # fdisk -l