centos7 ceph集群安装
一、服务器规划
主机名 |
主机IP |
磁盘 |
角色 |
---|---|---|---|
node3 |
public-ip:172.18.112.20 cluster-ip: 172.18.112.20 |
vdb |
ceph-deploy,monitor,mgr,osd |
node4 |
public-ip:172.18.112.19 cluster-ip: 172.18.112.19 |
vdb |
monitor,mgr,osd |
node5 |
public-ip:172.18.112.18 cluster-ip: 172.18.112.18 |
vdb |
monitor,mgr,osd |
二、设置主机名
前置条件:如果没有清楚dashboard 会报错:ceph dashboard Cannot import name UnrewindableBodyError
sudo pip uninstall urllib3 -y
sudo pip uninstall requests -y
sudo yum remove python-urllib3 -y
sudo yum remove python-requests -y
Now install both packages only via pip:
sudo pip install --upgrade urllib3
sudo pip install --upgrade requests
To install both packages only via yum:
sudo yum install python-urllib3
sudo yum install python-requests
主机名设置,三台主机分别执行属于自己的命令 node3
[root@localhost ~]# hostnamectl set-hostname nod3
[root@localhost ~]# hostname node3
node4
[root@localhost ~]# hostnamectl set-hostname node4
[root@localhost ~]# hostname node4
node5
[root@localhost ~]# hostnamectl set-hostname node5
[root@localhost ~]# hostname node5
执行完毕后要想看到效果,需要关闭当前命令行窗口,重新打开即可看到设置效果
关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
三、设置hosts文件
在3台机器上都执行下面命令,添加映射
echo "172.18.112.20 node3 " >> /etc/hosts
echo "172.18.112.19 node4 " >> /etc/hosts
echo "172.18.112.18 node5 " >> /etc/hosts
四、创建用户并设置免密登录
创建ceph相关目录与配置host文件限制
cat >> /etc/hosts <<EOF
10.167.21.129 ceph-1
10.167.21.130 ceph-2
10.167.21.131 ceph-3
EOF
cat >>/etc/security/limits.conf <<EOF
* hard nofile 655360
* soft nofile 655360
* hard nproc 655360
* soft nproc 655360
* soft core 655360
* hard core 655360
EOF
cat >>/etc/security/limits.d/20-nproc.conf <<EOF
* soft nproc unlimited
root soft nproc unlimited
EOF
mkdir -p /data/ceph/{admin,etc,lib,logs,osd}
ln -s /data/ceph/etc /etc/ceph
ln -s /data/ceph/lib /var/lib/ceph
ln -s /data/ceph/logs /var/log/ceph
创建用户(三台机器上都运行)
useradd -d /home/admin -m admin
echo "123456" | passwd admin --stdin
#sudo权限
echo "admin ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/admin
sudo chmod 0440 /etc/sudoers.d/admin
设置免密登录 (只在node3上执行)
[root@node3 ~]# su - admin
[admin@node3 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/admin/.ssh/id_rsa):
Created directory '/home/admin/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/admin/.ssh/id_rsa.
Your public key has been saved in /home/admin/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:qfWhuboKeoHQOOMLOIB5tjK1RPjgw/Csl4r6A1FiJYA admin@admin.ops5.bbdops.com
The key's randomart image is:
+---[RSA 2048]----+
|+o.. |
|E.+ |
|*% |
|X+X . |
|=@.+ S . |
|X.* o + . |
|oBo. . o . |
|ooo. . |
|+o....oo. |
+----[SHA256]-----+
[admin@node3 ~]$ ssh-copy-id admin@node3
[admin@node3 ~]$ ssh-copy-id admin@node4
[admin@node3 ~]$ ssh-copy-id admin@node5
注意: 没有ssh-copy-id
这个命令可以手动把公钥传到对应的机器上去
cat ~/.ssh/id_*.pub | ssh admin@host3 'cat >> .ssh/authorized_keys'
五、配置时间同步
三台都执行
[root@node3 ~]$ timedatectl #查看本地时间
[root@node3 ~]$ timedatectl set-timezone Asia/Shanghai #改为亚洲上海时间
[root@node3 ~]$ yum install -y chrony #同步工具
[root@node3 ~]$ systemctl enable chronyd #同步工具
[root@node3 ~]$ systemctl start chronyd.service
[root@node3 ~]$ sed -i -e '/^server/s/^/#/' -e '1a server ntp.aliyun.com iburst' /etc/chrony.conf
[root@node3 ~]$ systemctl restart chronyd.service
[root@node3 ~]$ chronyc -n sources -v #同步列表
[root@node3 ~]$ chronyc tracking #同步服务状态
[root@node3 ~]$ timedatectl status #查看本地时间
以下操作在osd节点执行
# 检查磁盘
[root@ceph-node1 ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
# 格式化磁盘
[root@ceph-node1 ~]# parted -s /dev/sdb mklabel gpt mkpart primary xfs 0% 100%
[root@ceph-node1 ~]# mkfs.xfs /dev/sdb -f
meta-data=/dev/sdb isize=512 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mount /dev/sdd1 /data/ceph/osd
查看磁盘格式
[root@ceph-node1 ~]# blkid -o value -s TYPE /dev/sdb
xfs
六、安装ceph-deploy并安装ceph软件包
配置ceph阿里云
cat >> /etc/yum.repos.d/ceph.repo <<EOF
[ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-octopus/el7/x86_64
enabled=1
gpgcheck=0
[ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-octopus/el7/noarch
enabled=1
gpgcheck=0
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-octopus/el7/SRPMS
enabled=1
gpgcheck=0
EOF
安装ceph-deploy
[admin@node3 ~]#
cd /data/ceph/admin/
[admin@node3 ~]# yum install -y python2-pip
[admin@node3 ~]#
pip install --upgrade remoto
[admin@node3 ~]# sudo yum install
ceph
ceph-deploy
初始化mon点
ceph需要epel源的包,所以安装的节点都需要yum install epel-release yum install lttng-ust -y
[admin@node3 ~]$ mkdir my-cluster
[admin@node3 ~]$ cd my-cluster
# new
[admin@node3 my-cluster]$ ceph-deploy new node3 node4 node5
Traceback (most recent call last):
File "/bin/ceph-deploy", line 18, in <module>
from ceph_deploy.cli import main
File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 1, in <module>
import pkg_resources
ImportError: No module named pkg_resources
#以上出现报错,是因为没有pip,安装pip
[admin@node3 my-cluster]$ sudo yum install epel-release
[admin@node3 my-cluster]$ sudo yum install python-pip
#重新初始化
[admin@node3 my-cluster]$ ceph-deploy new node3 node4 node5
[admin@node3 my-cluster]$ ls
ceph.conf ceph-deploy-ceph.log ceph.mon.keyring
[admin@node3 my-cluster]$ cat ceph.conf
[global]
fsid = 3a2a06c7-124f-4703-b798-88eb2950361e
mon_initial_members = node3, node4, node5
mon_host = 172.18.112.20,172.18.112.19,172.18.112.18
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
修改ceph.conf,添加如下配置
public network = 172.18.112.0/24
cluster network = 172.18.112.0/24
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 128
osd pool default pgp num = 128
osd pool default crush rule = 0
osd crush chooseleaf type = 1
max open files = 131072
ms bind ipv6 = false
[mon]
mon clock drift allowed = 10
mon clock drift warn backoff = 30
mon osd full ratio = .95
mon osd nearfull ratio = .85
mon osd down out interval = 600
mon osd report timeout = 300
mon allow pool delete = true
[osd]
osd recovery max active = 3
osd max backfills = 5
osd max scrubs = 2
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=1024
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog
filestore max sync interval = 5
osd op threads = 2
安装Ceph软件到指定节点
[admin@node3 my-cluster]$ ceph-deploy install --no-adjust-repos node3 node4 node5
–no-adjust-repos是直接使用本地源,不生成官方源.
部署初始的monitors,并获得keys
[admin@nod3 my-cluster]$ ceph-deploy mon create-initial
做完这一步,在当前目录下就会看到有如下的keyrings:
[admin@node3 my-cluster]$ ls
ceph.bootstrap-mds.keyring ceph.bootstrap-osd.keyring ceph.client.admin.keyring ceph-deploy-ceph.log
ceph.bootstrap-mgr.keyring ceph.bootstrap-rgw.keyring ceph.conf ceph.mon.keyring
将配置文件和密钥复制到集群各节点
配置文件就是生成的ceph.conf,而密钥是ceph.client.admin.keyring,当使用ceph客户端连接至ceph集群时需要使用的密默认密钥,这里我们所有节点都要复制,命令如下。
[admin@node3 my-cluster]$ ceph-deploy admin node3 node4 node5
七、部署ceph-mgr
#在L版本的`Ceph`中新增了`manager daemon`,如下命令部署一个`Manager`守护进程
[admin@node3 my-cluster]$ ceph-deploy mgr create node3
八、创建osd
#用法:ceph-deploy osd create –data {device} {ceph-node}
ceph-deploy osd create --data /dev/vdb node3
ceph-deploy osd create --data /dev/vdb node4
ceph-deploy osd create --data /dev/vdb node5
检查osd状态
[admin@node3 my-cluster]$ sudo ceph health
HEALTH_OK
[admin@node3 my-cluster]$ sudo ceph -s
cluster:
id: 3a2a06c7-124f-4703-b798-88eb2950361e
health: HEALTH_OK
services:
mon: 3 daemons, quorum node5,node4,node3
mgr: node3(active)
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 MiB
usage: 3.2 GiB used, 597 GiB / 600 GiB avail
pgs:
默认情况下ceph.client.admin.keyring文件的权限为600,属主和属组为root,如果在集群内节点使用cephadmin用户直接直接ceph命令,将会提示无法找到/etc/ceph/ceph.client.admin.keyring文件,因为权限不足。
如果使用sudo ceph不存在此问题,为方便直接使用ceph命令,可将权限设置为644。在集群节点上面node1 admin用户下执行下面命令。
[admin@node3 my-cluster]$ ceph -s
2021-12-28 07:59:36.062 7f52d08e0700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-12-28 07:59:36.062 7f52d08e0700 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
[errno 2] error connecting to the cluster
[admin@node3 my-cluster]$ sudo chmod 644 /etc/ceph/ceph.client.admin.keyring
[admin@node3 my-cluster]$ ceph -s
cluster:
id: 3a2a06c7-124f-4703-b798-88eb2950361e
health: HEALTH_OK
services:
mon: 3 daemons, quorum node5,node4,node3
mgr: node3(active)
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 MiB
usage: 3.2 GiB used, 597 GiB / 600 GiB avail
pgs:
查看osds
[admin@node3 my-cluster]$ sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.58589 root default
-3 0.19530 host node3
3 hdd 0.19530 osd.3 up 1.00000 1.00000
-5 0.19530 host node4
4 hdd 0.19530 osd.4 up 1.00000 1.00000
-7 0.19530 host node5
5 hdd 0.19530 osd.5 up 1.00000 1.00000
注意:任意一个环节安装失败了,需要卸载重装:
ceph-deploy purge CENTOS1 CENTOS2 CENTOS3
ceph-deploy purgedata CENTOS1 CENTOS2 CENTOS3
ceph-deploy forgetkeys
将三台节点的mon信息也删除:
rm -rf /var/run/ceph/
yum clean all && yum check-update
九、开启MGR监控模块
方式一:命令操作
ceph mgr module enable dashboard
如果以上操作报错如下:
Error ENOENT: all mgr daemons do not support module 'dashboard', pass --force to force enablement
则因为没有安装ceph-mgr-dashboard
,在mgr的节点上安装。
yum install ceph-mgr-dashboard
方式二:配置文件
# 编辑ceph.conf文件
vi ceph.conf
[mon]
mgr initial modules = dashboard
#推送配置
[admin@node3 my-cluster]$ ceph-deploy --overwrite-conf config push node3 node4 node5
#重启mgr
sudo systemctl restart ceph-mgr@node3
web登录配置 默认情况下,仪表板的所有HTTP连接均使用SSL/TLS进行保护。
#要快速启动并运行仪表板,可以使用以下内置命令生成并安装自签名证书:
[root@node3 my-cluster]# ceph dashboard create-self-signed-cert
Self-signed certificate created
创建目录
mkdir -p /usr/local/cephcluster/mgr-dashboard
生成密钥对openssl req -new -nodes -x509 -subj "/O=IT/CN=ceph-mgr-dashboard" -days 3650 -keyout dashboard.key -out dashboard.crt -extensions v3_ca
启动dashboard
ceph mgr module disable dashboard
ceph mgr module enable dashboard
设置IP与PORT
ceph config set mgr mgr/dashboard/server_addr 192.168.100.131
ceph config set mgr mgr/dashboard/server_port 9001
关闭HTTPS
ceph config set mgr mgr/dashboard/ssl false
查看服务信息
ceph mgr services
#创建具有管理员角色的用户:
[root@node3 my-cluster]# ceph dashboard set-login-credentials admin admin
Username and password updated
#查看ceph-mgr服务:
[root@node3 my-cluster]# ceph mgr services
{
"dashboard": "https://node3:8443/"
}
以上配置完成后,浏览器输入 https://node3:8443 输入用户名admin
,密码admin
登录即可查看
要本地hosts解析
RGW访问
集群创建完后, 默认没有文件系统, 我们创建一个Cephfs可以支持对外访问的文件系统。
1)创建两个存储池, 执行两条命令:
ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 64
少于5个OSD可把pg_num设置为128
OSD数量在5到10,可以设置pg_num为512
OSD数量在10到50,可以设置pg_num为4096
OSD数量大于50,需要计算pg_num的值
通过下面命令可以列出当前创建的存储池:
ceph osd lspools
创建fs, 名称为fs_test:
ceph fs new fs_test cephfs_metadata cephfs_data
状态查看, 以下信息代表正常
ceph fs ls
ceph mds stat
:
fuse挂载
先确定ceph-fuse命令能执行, 如果没有, 则安装:
yum -y install ceph-fuse
创建挂载目录
mkdir -p /usr/local/file/cephfs_directory
挂载cephfs
ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m 192.168.0.131:6789 /usr/local/file/cephfs_directory
出现starting fuse
表示挂载成功了
如果安装提示以下错误信息 ,
Error: Package: 2:ceph-radosgw-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0()(64bit)
Error: Package: 2:librbd1-14.2.11-0.el7.x86_64 (ceph)
Requires: liblttng-ust.so.0()(64bit)
Error: Package: 2:ceph-base-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0()(64bit)
Error: Package: 2:librados2-14.2.11-0.el7.x86_64 (ceph)
Requires: liblttng-ust.so.0()(64bit)
Error: Package: 2:ceph-base-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
Error: Package: 2:ceph-base-14.2.11-0.el7.x86_64 (ceph)
Requires: liblttng-ust.so.0()(64bit)
Error: Package: 2:ceph-common-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
Error: Package: 2:ceph-mon-14.2.11-0.el7.x86_64 (ceph)
Requires: libleveldb.so.1()(64bit)
Error: Package: 2:ceph-osd-14.2.11-0.el7.x86_64 (ceph)
Requires: libleveldb.so.1()(64bit)
Error: Package: 2:ceph-base-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0(LIBOATH_1.2.0)(64bit)
Error: Package: 2:ceph-common-14.2.11-0.el7.x86_64 (ceph)
Requires: libleveldb.so.1()(64bit)
Error: Package: 2:ceph-common-14.2.11-0.el7.x86_64 (ceph)
Requires: libbabeltrace.so.1()(64bit)
Error: Package: 2:ceph-common-14.2.11-0.el7.x86_64 (ceph)
Requires: libbabeltrace-ctf.so.1()(64bit)
Error: Package: 2:ceph-mgr-14.2.11-0.el7.x86_64 (ceph)
Requires: python-bcrypt
Error: Package: 2:ceph-common-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0(LIBOATH_1.2.0)(64bit)
Error: Package: 2:ceph-base-14.2.11-0.el7.x86_64 (ceph)
Requires: libleveldb.so.1()(64bit)
Error: Package: 2:librgw2-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0()(64bit)
Error: Package: 2:ceph-mgr-14.2.11-0.el7.x86_64 (ceph)
Requires: python-pecan
Error: Package: 2:ceph-base-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0(LIBOATH_1.12.0)(64bit)
Error: Package: 2:librgw2-14.2.11-0.el7.x86_64 (ceph)
Requires: liblttng-ust.so.0()(64bit)
Error: Package: 2:ceph-common-14.2.11-0.el7.x86_64 (ceph)
Requires: liboath.so.0()(64bit)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
这时候需要更新安装一下 依赖包 ,就可以完美解决
yum install -y yum-utils && yum-config-manager --add-repo https://dl.fedoraproject.org/pub/epel/7/x86_64/ && yum install --nogpgcheck -y epel-release && rpm --import /etc/pki/rpm-gpg/R
初始化osd时候,如果出现报错:ceph Can't open /dev/vdb exclusively. Mounted filesystem
硬盘的解挂载需要首先杀掉占用硬盘的任务,然后在进行解挂载。
lsof /data #查看是否有任务运行
sudo fuser -k /data #用root权限杀掉这些占用磁盘的任务
sudo umount /data #解挂载
出现的错误[2]。No data was received after 300 seconds, disconnecting...。原因是网络比较慢,达到5分钟超时。
解决办法是:
分别在每个节点上安装ceph,yum -y install ceph。
3、出现的错误[3]。[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph --version。这个错误和上面【2】的错误的解决办法一样。
4、安装ceph时,出现的错误,大致Error是:over-write。导致这个问题原因是修改了ceph用户里的ceph.conf文件,没有把这个文件内的最新信息发送给其他节点,所以要刷新信息,解决命令有两个:
一是:ceph-deploy --overwrite-conf config push node1-4
或者:ceph-deploy --overwrite-conf mon create node1-4
5、出现Error:RuntimeError: Failed to execute command: yum -y install epel-release
解决方法:yum -y remove ceph-release
6、在执行命令ceph osd tree时,发现节点名字不是node1-4时,判断是不是已经修改主机名成功,修改主机名称命令是: hostnamectl set-hostname name
7、在执行安装或者准备node节点时,出现了Error:[Errno 2] No such file or directory,说明以前卸载过ceph,但是没有清除干净配置文件,所以要删除以前的配置文件。解决办法是:
rm -rf /etc/ceph/*
rm -rf /var/lib/ceph/*/*
rm -rf /var/log/ceph/*
rm -rf /var/run/ceph/*
8、出现Error:/var/run/yum.pid 已被锁定,PID 为 xxxx 的另一个程序正在运行。这个问题解决方案是:
方法一:等一会就好了,1分钟左右
方法二:rm -f /var/run/yum.pid
9、Error:您必须拥有一个终端来执行 sudo。或者在ceph2用户下输入root密码不好使;(ceph2是我自己的用户名)
解决办法:命令行输入:echo “ceph2 ALL = (root) NOPASSWD:ALL”|sudo tree /etc/sudoers.d/ceph
在执行visudo命令,查看添加成功没有 ceph2 ALL = (root) NOPASSWD:ALL 这段信息。确保visudo命令里面有下面这2句
ceph2 ALL=(ALL) NOPASSWD: ALL
Defalults:ceph2 !requiretty
10、在执行准备节点时,出现ERROR: error creating empty object store in /var/local/osd0: (13) Permission denied
[admin-node][ERROR ] RuntimeError: command returned non-zero exitstatus: 1
[ceph_deploy][ERROR ] RuntimeError: Failedto execute command: /usr/sbin/ceph-disk -v activate --mark-init systemd --mount/var/local/osd0
原因是:创建节点的目录权限不够;解决办法:进入到各节点上运行chmod 777 /var/local/osd0