学习GlusterFS（四）

#################################################################################################33

名称	解释
Brick	最基本的存储单元，表示为trusted storage pool中输出的目录，供客户端挂载用。
Volume	一个卷。在逻辑上由N个bricks组成。
FUSE	Unix-like OS上的可动态加载的模块，允许用户不用修改内核即可创建自己的文件系统。
Glusterd	Gluster management daemon，要在trusted storage pool中所有的服务器上运行。
POSIX	一个标准，GlusterFS兼容。

命令	功能
gluster peer probe	添加节点
gluster peer detach	移除节点
gluster volume create	创建卷
gluster volume start	启动卷
gluster volume stop	停止卷
gluster volume delete	删除卷
gluster volume quota enable	开启卷配额
gluster volume quota enable	关闭卷配额
gluster volume quota limit-usage	设定卷配额

glusterfs

GlusterFS是一个可伸缩的网络文件系统，使用常见的现成的硬件，您可以创建大型分布式存储流媒体解决方案、数据分析、和其他数据相关的任务。GlusterFS是自由和开源软件。

详细参考官网： gluster

资源定义

Brick: 存储的基本单元，表现为服务器上可信存储池的导出目录
Volume: 卷是bricks的逻辑集合
Cluster: 一组计算机组成的集群
Distributed File System: 允许多个客户端通过网络并发访问的文件系统
GFID: 每个GlusterFs中的文件或者目录都有一个128bit的数字标识称为GFID
glusterd: 管理守护进程需要运行在所有的提供授信存储池的服务器上
Namespace: 名称空间是被创建的一个抽象容器或环境用来保存唯一标识符号
Quorum: 设置一个可信的存储池中最多失效的主机节点数量
Quota: 允许通过目录或者卷设置对磁盘空间使用的限制
Posix: 可移植操作系统接口是IEEE定义的一系列相关api标准
Vol File: Vol文件是glusterfs进程使用的配置文件
Distributed: 分布卷
Replicated: 复本卷
Distributed Replicated: 分布式复制卷
Geo-Replication: 异地备份为站点通过局域网、广域网、互联网提供一个连续的异步的和增量复制服务
Metedata: 元数据定义为关于数据的数据描述信息，在GlusterFs没有专用的元数据存储内容
Extended Attributes: 扩展属性是文件系统的一个特性
FUSE: 用户空间的文件系统是一个用于类Unix操作系统的可加载内核模块，以允许非特权用户在适用内核代码的情况下创建自己的文件系统。实现了在用户空间下运行文件系统代码

环境基础

10.1.1.11   c6-vm1   centos6.6  glusterfs-3.10 第二块硬盘10G
10.1.1.12   c6-vm2   centos6.6  glusterfs-3.10 第二块硬盘10G
10.1.1.13   c6-vm3   centos6.6  client

yum 仓库安装

[root@c6-vm1 yum.repos.d]# cat glusterfs.repo
[glusterfs]
name=glusterfs
baseurl=http://buildlogs.centos.org/centos/6/storage/x86_64/gluster-3.10/
enabled=1
gpgcheck=0
[debuginfo]
name=glusterfs debuginfo
baseurl=http://debuginfo.centos.org/centos/6/storage/x86_64/
enabled=1
gpgcheck=0

install

yum  install -y glusterfs-server glusterfs-cli glusterfs-geo-replication

启动程序加入开启自启动

[root@c6-vm1]# /etc/init.d/glusterd start
[root@c6-vm1 ~]# chkconfig glusterd on

存储主机加入信任存储池

# 在其余主机添加除自己之外的主机就可以
[root@c6-vm1]#  gluster peer probe c6-vm2
peer probe: success.

查看状态

[root@c6-vm1 ~]# gluster peer status
Number of Peers: 1

Hostname: c6-vm2
Uuid: 60eac2bb-ae30-4c3d-8c14-b93a3cb0ad98
State: Peer in Cluster (Connected)

磁盘分区

fdisk -l 查看磁盘块设备，可以看到/dev/sdb 第二块新硬盘

[root@c6-vm1 ~]# fdisk -l

Disk /dev/sda: 37.6 GB, 37580963840 bytes
255 heads, 63 sectors/track, 4568 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f1817

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          26      204800   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              26        3851    30720000   83  Linux
/dev/sda3            3851        4569     5774336   82  Linux swap / Solaris

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

注：mkfs支持的分区类型有限，且最大只支持16TB；当分区大于4T时,fdisk不适用，推荐用parted分区

# fdisk 分区操作 在二台机器上分别执行
# fdisk  /dev/vdb => n => p => 1 => 回车 => w（w为保存、m为帮助）
[root@c6-vm1 ~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xf2a1a3ec.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305):
Using default value 1305
[root@c6-vm1 ~]# mkfs.ext4  /dev/sdb1

# 在二台机器上分别执行 建立挂载块设备的目录并挂载加入fstab中
[root@c6-vm1 ~]# mkdir /data/gfs1
[root@c6-vm1 ~]# mount /dev/sdb1  /data/gfs1
[root@c6-vm1 ~]# mount -a
[root@c6-vm1 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        29G  874M   27G   4% /
tmpfs           358M     0  358M   0% /dev/shm
/dev/sda1       190M   25M  156M  14% /boot
/dev/sdb1       9.8G   23M  9.2G   1% /data/gfs1
[root@c6-vm1 ~]# echo "/dev/sdb1   /data/gfs1 ext4 defaults 1 2" >> /etc/fstab

公司如果有raid卡，可以在底层做一层raid5提高IO性能，再在上头做GlusterFS。当然没有raid卡直接做GlusterFS也没事

创建volume

Distributed：分布式卷，文件通过hash算法随机的分布到由bricks组成的卷上。 
Replicated：复制式卷，类似raid1，replica数必须等于volume中brick所包含的存储服务器数，可用性高。 
Striped：条带式卷，类似与raid0，stripe数必须等于volume中brick所包含的存储服务器数，文件被分成数据块，以Round Robin的方式存储在bricks中，并发粒度是数据块，大文件性能好。 
Distributed Striped：分布式的条带卷，volume中brick所包含的存储服务器数必须是stripe的倍数(>=2倍)，兼顾分布式和条带式的功能。 
Distributed Replicated：分布式的复制卷，volume中brick所包含的存储服务器数必须是 replica 的倍数(>=2倍)，兼顾分布式和复制式的功能。

创建复制卷

[root@c6-vm1 ~]# gluster volume create gv2   replica 2 c6-vm1:/data/gfs1 c6-vm2:/data/gfs1 force
volume create: gv2: success: please start the volume to access data
[root@c6-vm1 ~]# gluster volume info

Volume Name: gv2
Type: Replicate
Volume ID: fa64409f-814b-4191-b9e4-8c317bf63a94
Status: Created          #这里状态表示刚刚被创建但是还没有启动
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: c6-vm1:/data/gfs1
Brick2: c6-vm2:/data/gfs1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

start volume

[root@c6-vm1 gfs1]#  gluster volume start gv2
volume start: gv2: success
[root@c6-vm1 gfs1]# gluster volume info

Volume Name: gv2
Type: Replicate
Volume ID: fa64409f-814b-4191-b9e4-8c317bf63a94
Status: Started      #启动后状态变成start
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: c6-vm1:/data/gfs1
Brick2: c6-vm2:/data/gfs1
Options Reconfigured:
transport.address-family: inet
nfs.disable: off

客户端挂载

c6-vm3

yum glusterfs

yum install -y glusterfs gluster-fuse

glusterfs 方式挂载

[root@c6-vm3 yum.repos.d]# mount -t glusterfs c6-vm1:/gv2 /mnt
[root@c6-vm3 mnt]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        29G  923M   27G   4% /
tmpfs           358M     0  358M   0% /dev/shm
/dev/sda1       190M   25M  156M  14% /boot
c6-vm1:/gv2     9.8G   23M  9.2G   1% /mnt

nfs 方式挂载

[root@c6-vm2 gfs1]# mount -t nfs -o mountproto=tcp,vers=3 c6-vm2:gv2  /mnt/
[root@c6-vm2 gfs1]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        29G  875M   27G   4% /
tmpfs           358M     0  358M   0% /dev/shm
/dev/sda1       190M   25M  156M  14% /boot
/dev/sdb1       9.8G   23M  9.2G   1% /data/gfs1
c6-vm2:gv2      9.8G   33M   14G   1% /mnt

# 创建文件
[root@c6-vm3 mnt]# cd /mnt/
[root@c6-vm3 mnt]# touch file{1..10}

在存储端验证

c6-vm1

[root@c6-vm1 gfs1]# ls
file1  file10  file2  file3  file4  file5  file6  file7  file8  file9  lost+found

c6-vm2

[root@c6-vm2 gfs1]# ls
file1  file10  file2  file3  file4  file5  file6  file7  file8  file9  lost+found

status

[root@c6-vm1 gfs1]#  gluster volume status gv2
Status of volume: gv2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick c6-vm1:/data/gfs1                     49152     0          Y       1976
Brick c6-vm2:/data/gfs1                     49152     0          Y       1883
NFS Server on localhost                     2049      0          Y       1996
Self-heal Daemon on localhost               N/A       N/A        Y       2007
NFS Server on c6-vm2                        2049      0          Y       1903
Self-heal Daemon on c6-vm2                  N/A       N/A        Y       1914

Task Status of Volume gv2
------------------------------------------------------------------------------
There are no active volume task

故障演练

把c6-vm1关机

因为我们之前在c6-vm3（client)端mount挂载的时候指定的是c6-vm1主机

当我们把c6-vm1关机，观察后面的情况是怎么样

-w935

查看volume状态

-w808

在c6-vm3上操作明显没有问题

注意: 当c6-vm1关机的时候，在ls该目录会有短暂时间的延迟卡主的现象

-w1006

在去c6-vm2上看文件状态

新建的bds已经有了，删除的file0也没有删除成功

-w1051

现在我们尝试开机c6-vm1

发现文件系统报错了，完了这个机器成功被我们弄坏了修复吧

-w804

可以看到我们进入了只读文件系统中根本注释不了这行

-w855

解决办法重新挂载

# 根系统重新挂载 救命的命令啊
mount -o remount rw /

image

下图可以看到我们已经把有问题那行注释掉了然后reboot

image

登录c6-vm1

[root@c6-vm1 gfs1]# mount /dev/sdb1 /data/gfs1/
[root@c6-vm1 gfs1]# gluster volume status
Status of volume: gv2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick c6-vm1:/data/gfs1                     N/A       N/A        N       N/A
Brick c6-vm2:/data/gfs1                     49152     0          Y       1883
NFS Server on localhost                     2049      0          Y       1082
Self-heal Daemon on localhost               N/A       N/A        Y       1146
NFS Server on c6-vm2                        2049      0          Y       1903
Self-heal Daemon on c6-vm2                  N/A       N/A        Y       1914

Task Status of Volume gv2
------------------------------------------------------------------------------
There are no active volume tasks

在正常的节点c6-vm2 执行命令查看信息

[root@c6-vm2 gfs1]# getfattr -d -m '.*' /data/gfs1/
getfattr: Removing leading '/' from absolute path names
# file: data/gfs1/
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.gv2-client-0=0sAAAAAAAAAAAAAAAD
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAA/////w==
trusted.glusterfs.volume-id=0s+mRAn4FLQZG55Iwxe/Y6lA==

在机器关机的c6-vm1节点上进行set写入

[root@c6-vm1 gfs1]# setfattr -n trusted.glusterfs.volume-id -v 0s+mRAn4FLQZG55Iwxe/Y6lA== /data/gfs1/ 
[root@c6-vm1 gfs1]# setfattr -n trusted.glusterfs.volume-id -v 0s+mRAn4FLQZG55Iwxe/Y6lA== /data/gfs1/
[root@c6-vm1 gfs1]# setfattr -n trusted.gfid -v 0sAAAAAAAAAAAAAAAAAAAAAQ== /data/gfs1/
[root@c6-vm1 gfs1]# setfattr -n trusted.afr.dirty -v 0sAAAAAAAAAAAAAAAA  /data/gfs1/
[root@c6-vm1 gfs1]# setfattr -n trusted.glusterfs.dht -v 0sAAAAAQAAAAAAAAAA/////w== /data/gfs1/

重启glusterfs

[root@c6-vm1 gfs1]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [确定]
Starting glusterd:                                         [确定]

重启完成之后,根据数据的大小,等待片刻,则同步完成观察下面目录变化

[root@c6-vm1 gfs1]# ll
总用量 32
-rw-r--r-- 2 root root     5 10月  9 23:08 a
-rw-r--r-- 2 root root    29 10月  9 19:53 a.txt
-rw-r--r-- 2 root root     0 10月  2 13:29 ba
-rw-r--r-- 2 root root     0 10月  2 12:54 file1
-rw-r--r-- 2 root root     0 10月  2 12:54 file10
-rw-r--r-- 2 root root     5 10月  9 19:14 file2
-rw-r--r-- 2 root root     0 10月  2 12:54 file3
-rw-r--r-- 2 root root     0 10月  2 12:54 file4
-rw-r--r-- 2 root root     0 10月  2 12:54 file5
-rw-r--r-- 2 root root     0 10月  2 12:54 file6
-rw-r--r-- 2 root root     0 10月  2 12:54 file7
-rw-r--r-- 2 root root     0 10月  2 12:54 file8
-rw-r--r-- 2 root root     0 10月  2 12:54 file9
drwx------ 2 root root 16384 10月  9 03:12 lost+found

这个目录里面保存一些同步信息

[root@c6-vm1 gfs1]# cd .glusterfs/
[root@c6-vm1 .glusterfs]# ls
00  1c  31  3e  3f  46  4c  56  64  82  89  a3  c7  changelogs  eb  gfs1.db  health_check  indices  landfill  quanrantine  unlink
[root@c6-vm1 .glusterfs]# ll
总用量 100
drwx------ 3 root root  4096 10月  9 23:06 00
drwx------ 3 root root  4096 10月  9 23:08 1c
drwx------ 3 root root  4096 10月  9 23:08 31
drwx------ 3 root root  4096 10月  9 23:08 3e
drwx------ 3 root root  4096 10月  9 23:08 3f
drwx------ 3 root root  4096 10月  9 23:08 46
drwx------ 3 root root  4096 10月  9 23:08 4c
drwx------ 3 root root  4096 10月  9 23:07 56
drwx------ 3 root root  4096 10月  9 23:08 64
drwx------ 3 root root  4096 10月  9 23:08 82
drwx------ 3 root root  4096 10月  9 23:08 89
drwx------ 3 root root  4096 10月  9 23:08 a3
drwx------ 4 root root  4096 10月  9 23:08 c7
drw------- 4 root root  4096 10月  9 22:30 changelogs
drwx------ 3 root root  4096 10月  9 23:08 eb
-rw-r--r-- 1 root root 20480 10月  9 22:30 gfs1.db
-rw-r--r-- 1 root root    19 10月  9 23:17 health_check
drw------- 5 root root  4096 10月  9 22:30 indices
drwxr-xr-x 2 root root  4096 10月  9 23:06 landfill
drw------- 2 root root  4096 10月  9 22:30 quanrantine
drw------- 2 root root  4096 10月  9 23:06 unlink

同步后的目录与正常节点c6-vm2目录保持一致了

[root@c6-vm1 .glusterfs]# cd ..
[root@c6-vm1 gfs1]# ls
bds  file1  file10  file2  file3  file4  file5  file6  file7  file8  file9  lost+found
[root@c6-vm1 gfs1]# ll
总用量 24
-rw-r--r-- 2 root root     8 10月  9 23:21 bds
-rw-r--r-- 2 root root     0 10月  2 12:54 file1
-rw-r--r-- 2 root root     0 10月  2 12:54 file10
-rw-r--r-- 2 root root     5 10月  9 19:14 file2
-rw-r--r-- 2 root root     0 10月  2 12:54 file3
-rw-r--r-- 2 root root     0 10月  2 12:54 file4
-rw-r--r-- 2 root root     0 10月  2 12:54 file5
-rw-r--r-- 2 root root     0 10月  2 12:54 file6
-rw-r--r-- 2 root root     0 10月  2 12:54 file7
-rw-r--r-- 2 root root     0 10月  2 12:54 file8
-rw-r--r-- 2 root root     0 10月  2 12:54 file9
drwx------ 2 root root 16384 10月  9 03:12 lost+found

基于 GlusterFS 实现 Docker 集群的分布式存储

GlusterFS 分布式文件系统简介

GlusterFS 概述

图1. GlusterFS 总体架构

表1. GlusterFS 常见术语

GlusterFS卷类型

图2. Distribute volume示意图

图3. Stripe volume示意图

图4 . Replica volume示意图

图5 . Distribute stripe volume 示意图

图6 . Distribute replica volume 示意图

图7 . Stripe replica volume 示意图

图8 . Distribute stripe replica volume 示意图

GlusterFS 常用命令

表2. GlusterFS 客户端常用命令