自建基于Cephfs的NFS和S3高可用集群
自建基于Ceph的NFS和S3高可用集群
应总工程师的的要求,搭建一套Ceph集群,要求可以达到NFS高可用、S3服务高可用。主要用于测试,积累经验后可用于搭建自己使用的备份存储。
1. 配置环境
新建三台虚拟机,每台虚拟机有三块块硬盘(两块用于做OSD)、两个网卡,分属两个网段,规划其IP分别为:
- node1 public:192.168.40.61; cluster:172.18.0.61
- node2 public:192.168.40.62; cluster:172.18.0.62
- node3 public:192.168.40.63; cluster:172.18.0.63
修改所有虚拟机的hosts
[root@node1 ~]# vim /etc/hosts
192.168.40.61 node1
192.168.40.62 node2
192.168.40.63 node3
SSH设置免密登录
[root@node1 ~]# ssh-copy-id root@node01 # 若提示没有秘钥,就先用ssh-keygen生成秘钥
[root@node1 ~]# ssh-copy-id root@node02
[root@node1 ~]# ssh-copy-id root@node03
内核升级
根据下面的内容进行内核升级。
[root@node1 ~]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org # 导入ELRepo仓库的公共密钥
[root@node1 ~]# yum install https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm # 安装ELRepo仓库的yum源
[root@node1 ~]# yum --disablerepo="*" --enablerepo="elrepo-kernel" list available # 查看可用系统内核包,可以看到5.4和5.16两个版本
[root@node1 ~]# yum --enablerepo=elrepo-kernel install kernel-ml # --enablerepo 选项开启 CentOS 系统上的指定仓库。默认开启的是 elrepo,这里用 elrepo-kernel 替换。
# 内核安装好后,需要设置为默认启动选项并重启后才会生效
[root@node1 ~]# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg # 查看系统上的所有可用内核:
[root@node1 ~]# grub2-set-default 0 # 其中 0 是上面查询出来的可用内核
[root@node1 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg # 生成 grub 配置文件
[root@node1 ~]# reboot # 重启
[root@node1 ~]# uname -r # 验证
关闭防火墙和SELinux
[root@node1 ~]# systemctl stop firewalld
[root@node1 ~]# systemctl disable firewalld
[root@node1 ~]# setenforce 0
[root@node1 ~]# vi /etc/selinux/config
修改SELINUX=disabled
SELINUX=disabled
或者直接运行以下命令
[root@node1 ~]# sed -i 's/=enforcing/=disabled/' /etc/selinux/config
时间同步
在所有节点上安装chrony
yum -y install chrony
在node1节点上配置chrony服务
[root@node1 ~]# vim /etc/chrony.conf
server ntp.aliyun.com iburst # 注释掉其他server
......
#allow 192.168.0.0/16
allow 192.168.40.0/24 # 添加允许访问的网段
[root@node1 ~]# systemctl enable chronyd
[root@node1 ~]# systemctl start chronyd
node2、node3删除其他server,只有一个server
[root@node2 ~]# vim /etc/chrony.conf
......
server 192.168.40.61 iburst
[root@node2 ~]# systemctl enable chronyd
[root@node2 ~]# systemctl start chronyd
[root@node2 ~]# chronyc sources -v
210 Number of sources = 1
...
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? node1 0 8 0 - +0ns[ +0ns] +/- 0ns
配置yum源
所有节点都要配置。
[root@node1 ~]# yum install -y epel-release
[root@node1 ~]# vim /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[root@node1 ~]# yum clean all && yum makecache
[root@node1 ~]# yum update
[root@node1 ~]# yum install ceph-deploy -y # 只在主节点安装
[root@node1 ~]# yum install -y ceph ceph-mon ceph-mgr ceph-mgr-dashboard ceph-radosgw ceph-mds ceph-osd # 在所有节点上都要安装
2. 安装和配置Ceph
[root@node1 ~]# mkdir -p /root/my-cluster # 用户存放Ceph最初的配置文件
[root@node1 ~]# cd ~/my-cluster
[root@node1 my-cluster]# ceph-deploy new --public-network 192.168.40.0/24 --cluster-network 172.18.0.0/24 node1 # 创建一个ceph集群,mon为node1
[root@node1 my-cluster]# ceph-deploy mon create-initial # 初始化monitor
[root@node1 my-cluster]# ceph-deploy admin node1 node2 node3 # 给节点分配秘钥和配置文件
[root@node1 my-cluster]# ceph-deploy mgr create node1 # 配置配置Manager节点
[root@node1 my-cluster]# ceph-deploy mgr create node2 node3
[root@node1 my-cluster]# ceph-deploy mon create node2 # 扩展mon节点
[root@node1 my-cluster]# ceph-deploy mon create node3
[root@node1 my-cluster]# ceph -s
[root@node1 my-cluster]# ceph-deploy osd create node1 --data /dev/sdb # 添加OSD,事先用lsblk确认对应的盘符
[root@node1 my-cluster]# ceph-deploy osd create node2 --data /dev/sdb
[root@node1 my-cluster]# ceph-deploy osd create node3 --data /dev/sdb
[root@node1 my-cluster]# ceph-deploy osd create node1 --data /dev/sdc
[root@node1 my-cluster]# ceph-deploy osd create node2 --data /dev/sdc
[root@node1 my-cluster]# ceph-deploy osd create node3 --data /dev/sdc
[root@node1 my-cluster]# ceph osd tree # 确认OSD状态
[root@node1 my-cluster]# ceph -s # 确认ceph集群的状态
[root@node1 my-cluster]# ceph mgr module enable dashboard # 开启dashboard
[root@node1 my-cluster]# ceph dashboard create-self-signed-cert # 创建证书
[root@node1 my-cluster]# ceph dashboard set-login-credentials admin 123456 # 创建 web 登录用户密码
[root@node1 my-cluster]# ceph mgr services # 查看服务访问方式
安装dashboard后就可以用 https://192.168.40.61:8443/ 这个地址来查看ceph集群的状态了。
3.安装和配置nfs-ganesha
所有节点上都要安装nfs-ganesha。
[root@node1 my-cluster]# vim /etc/yum.repos.d/nfs-ganasha.repo
[nfsganesha]
name=nfsganesha
baseurl=https://mirrors.cloud.tencent.com/ceph/nfs-ganesha/rpm-V2.8-stable/nautilus/x86_64/
gpgcheck=0
enable=1
[root@node1 my-cluster]# yum makecache
[root@node1 my-cluster]# yum install -y nfs-ganesha nfs-ganesha-ceph nfs-ganesha-rados-grace nfs-ganesha-rgw nfs-utils rpcbind haproxy keepalived
[root@node1 my-cluster]# ceph-deploy mds create node1 node2 node3
[root@node1 my-cluster]# ceph osd pool create fs-meta 32
[root@node1 my-cluster]# ceph osd pool create fs-data 128
[root@node1 my-cluster]# ceph fs new cephfs fs-meta fs-data # 创建cephfs
[root@node1 my-cluster]# ceph fs ls # 查看cephfs
[root@node1 my-cluster]# ceph-deploy --overwrite-conf admin node1 node2 node3 # 更新配置文件
[root@node1 my-cluster]# mkdir /mnt/cephfs
[root@node1 my-cluster]# mount -t ceph 192.168.40.61:/ /mnt/cephfs/ -o name=admin,secret=AQDXrtNhaD/VOBAAuVtilymuHIkb9elyH6bCVQ== # 挂载一下试试,secret的值从/etc/ceph/ceph.client.admin.keyring中获得
[root@node1 my-cluster]# mkdir -p /mnt/cephfs/nfs1 # 创建两个文件夹
[root@node1 my-cluster]# mkdir -p /mnt/cephfs/nfs2
[root@node1 my-cluster]# vim /etc/ganesha/ganesha.conf # 配置以下内容
NFS_CORE_PARAM {
Enable_NLM = false;
NFS_Port = 52049;
Enable_RQUOTA = false;
}
EXPORT_DEFAULTS {
Access_Type = RW;
Anonymous_uid = 65534;
Anonymous_gid = 65534;
}
LOG {
Default_Log_Level = INFO;
Facility {
name = FILE;
destination = "/var/log/ganesha/ganesha.log";
enable = active;
}
}
NFSv4 {
# Delegations = false;
# RecoveryBackend = 'rados_cluster';
# Minor_Versions = 1,2;
}
EXPORT
{
Export_Id = 1;
Path = /nfs1;
Pseudo = /nfs1;
Squash = no_root_squash;
Access_Type = RW;
FSAL {
secret_access_key = "AQDXrtNhaD/VOBAAuVtilymuHIkb9elyH6bCVQ==";
user_id = "admin";
name = "CEPH";
filesystem = "cephfs";
}
}
EXPORT
{
Export_Id = 2;
Path = /nfs2;
Pseudo = /nfs2;
Squash = no_root_squash;
Access_Type = RW;
FSAL {
secret_access_key = "AQDXrtNhaD/VOBAAuVtilymuHIkb9elyH6bCVQ==";
user_id = "admin";
name = "CEPH";
filesystem = "cephfs";
}
}
[root@node1 my-cluster]# systemctl start nfs-ganesha
[root@node1 my-cluster]# systemctl enable nfs-ganesha
[root@node1 my-cluster]# scp /etc/ganesha/ganesha.conf root@node2:/etc/ganesha/ganesha.conf # 拷贝配置文件到另外两个节点上
[root@node1 my-cluster]# scp /etc/ganesha/ganesha.conf root@node3:/etc/ganesha/ganesha.conf
[root@node1 my-cluster]# ssh root@node2 systemctl start nfs-ganesha # 2、 3节点重启nfs-ganesha 服务
[root@node1 my-cluster]# ssh root@node3 systemctl start nfs-ganesha
[root@node1 my-cluster]# ssh root@node2 systemctl enable nfs-ganesha
[root@node1 my-cluster]# ssh root@node3 systemctl enable nfs-ganesha
[root@node1 my-cluster]# systemctl status nfs-ganesha
[root@node1 my-cluster]# showmount -e node1
Export list for node1:
/nfs1 (everyone)
/nfs2 (everyone)
4. 配置S3服务
[root@node1 my-cluster]# ceph-deploy rgw create node1 node2
[root@node1 my-cluster]# systemctl status ceph-radosgw@rgw.node1.service
[root@node1 my-cluster]# ceph -s
配置S3的时候出现了rgw服务启动失败的情况。
google了一番之后,在这里找到了大致的原因:pg_num < pgp_num or mon_max_pg_per_osd exceeded。使用以下命令进行调试:
[root@node1 my-cluster]# /usr/bin/radosgw -d --cluster ceph --name client.rgw.node1 --setuser ceph --setgroup ceph --debug-rgw=2
好像是pg太多了,因为之前手动添加了几个pool,而rgw服务是会自动创建所需要的pool的(数据池除外),这里没有必要手动创建。
删除多余的pool
[root@node1 my-cluster]# ceph osd pool delete .rgw.control .rgw.control --yes-i-really-really-mean-it
[root@node1 my-cluster]# systemctl start ceph-radosgw@rgw.node1.service
[root@node1 my-cluster]# systemctl status ceph-radosgw@rgw.node1.service # rgw服务正常了
[root@node1 my-cluster]# ceph osd crush class ls
[
"hdd"
]
[root@node1 my-cluster]# ceph osd crush rule create-replicated rule-hdd default host hdd
[root@node1 my-cluster]# [root@node1 my-cluster]# ceph osd crush rule ls
replicated_rule
rule-hdd
[root@node1 my-cluster]# ceph osd pool create default.rgw.buckets.data 64 # 创建存储池,efault.rgw.buckets.index已经存在,不需重复创建
[root@node1 my-cluster]# ceph osd pool application enable default.rgw.buckets.data rgw
[root@node1 my-cluster]# ceph osd pool application enable default.rgw.buckets.index rgw
修改所有存储池的crush规则,在node1上执行:
[root@node1 my-cluster]# for i in `ceph osd lspools | grep -v data | awk '{print $2}'`; do ceph osd pool set $i crush_rule rule-hdd; done
[root@node1 my-cluster]# ceph osd pool set default.rgw.buckets.data crush_rule rule-hdd
[root@node1 my-cluster]# radosgw-admin user create --uid="testuser" --display-name="First User" # 创建一个S3用户
{
"user_id": "testuser",
"display_name": "First User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [
{
"user": "testuser",
"access_key": "LOLDHV9L1CS12586AQ4Y",
"secret_key": "9aHAOD8vOTwTI5OpBAbVPL35QqJA8yfZLQOI7jHJ"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}
[root@node1 my-cluster]# yum install python-boto # 安装python-boto模块
[root@node1 my-cluster]# vi s3test.py # 写个简单的脚本,创建bucket
import boto.s3.connection
access_key = 'LOLDHV9L1CS12586AQ4Y'
secret_key = '9aHAOD8vOTwTI5OpBAbVPL35QqJA8yfZLQOI7jHJ'
conn = boto.connect_s3(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
host='node1', port=7480,
is_secure=False, calling_format=boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.create_bucket('my-new-bucket')
for bucket in conn.get_all_buckets():
print "{name} {created}".format(
name=bucket.name,
created=bucket.creation_date,
)
[root@node1 my-cluster]# [root@node1 my-cluster]# python s3.py
my-new-bucket 2022-01-21T01:39:11.186Z # 创建bucket成功。
也可以使用s3 browser,进行bucket的创建删除及对象上传下载的测试。
5. 配置haproxy
[root@node1 my-cluster]# vim /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 8000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
# option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 8000
listen stats
bind *:9090
mode http
stats enable
stats uri /
stats refresh 5s
stats realm Haproxy\ Stats
stats auth admin:admin
frontend nfs-in
bind 192.168.40.64:2049
mode tcp
option tcplog
default_backend nfs-back
frontend s3-in
bind 192.168.40.64:58080
mode tcp
option tcplog
default_backend s3-back
frontend dashboard-in
bind 192.168.40.64:8888
mode tcp
option tcplog
default_backend dashboard-back
backend nfs-back
balance source
mode tcp
log /dev/log local0 debug
server node1 192.168.40.61:52049 check
server node2 192.168.40.62:52049 check
server node3 192.168.40.63:52049 check
backend s3-back
balance source
mode tcp
log /dev/log local0 debug
server node1 192.168.40.61:7480 check
server node2 192.168.40.62:7480 check
backend dashboard-back
balance source
mode tcp
log /dev/log local0 debug
server node1 192.168.40.61:8443 check
server node2 192.168.40.62:8443 check
server node3 192.168.40.63:8443 check
[root@node1 my-cluster]# systemctl start haproxy
[root@node1 my-cluster]# systemctl enable haproxy
[root@node1 my-cluster]# scp /etc/haproxy/haproxy.cfg root@node2:/etc/haproxy/haproxy.cfg
[root@node1 my-cluster]# ssh root@node2 systemctl start haproxy
[root@node1 my-cluster]# ssh root@node2 systemctl enable haproxy
6. 配置keepalived
配置抢占式keepalived,node1为MASTER,优先级200,node2/3为BACKUP,优先级分别为150、100
[root@node1 my-cluster]# vim /etc/keepalived/keepalived.conf
global_defs {
router_id CEPH_NFS # 标识信息,随便写;
}
vrrp_script check_haproxy { # 要执行的脚本
script "killall -0 haproxy"
weight -20
interval 2
rise 2
fall 2
}
vrrp_instance VI_0 {
state MASTER # node1为MASTER,其余节点要修改为BACKUP
priority 200 # 优先级,优先级高的优先获取VIP,其余两个节点分别设置150和100
interface ens192 # 定义网络接口
virtual_router_id 51 # 三个节点都要是51
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.40.64/24 dev ens192 # 虚拟IP
}
track_script {
check_haproxy
}
}
[root@node1 my-cluster]# systemctl start keepalived
[root@node1 my-cluster]# systemctl enable keepalived
记得修改节点2/3的keepalived配置
[root@node1 my-cluster]# ssh root@node2 systemctl start keepalived
[root@node1 my-cluster]# ssh root@node2 systemctl enable keepalived
[root@node1 my-cluster]# ssh root@node3 systemctl start keepalived
[root@node1 my-cluster]# ssh root@node3 systemctl enable keepalived
7. 测试、验证高可用
- NFS
找一台测试机,将nfs1挂载,写入一些文件。这个时候虚拟IP是在node1上的。
将node1重启,会发现正在写入的文件会卡死进程,这很正常,因为用的不是nfs v4.1,无法将会话(session)继续。
Ctrl C断掉卡死的进程,重新写入,能够正常写入,说明VIP正常迁移了,nfs达到了最基本的高可用。
- s3
通过s3 browser上传文件,重启node1,上传进程会卡几秒,之后上传继续。
S3服务达到了基本的高可用。