【ceph运维】增加mon

增加mon节点

集群初始状态:

[root@node01 ~]# ceph -s
  cluster:
    id:     33af1a28-8923-4d40-af06-90c376ed74b0
    health: HEALTH_WARN
            Degraded data redundancy: 418/627 objects degraded (66.667%), 47 pgs degraded, 136 pgs undersized
            mon is allowing insecure global_id reclaim
 
  services:
    mon: 1 daemons, quorum node01 (age 23m)
    mgr: node01(active, since 19m)
    mds: cephfs:1 {0=node01=up:active}
    osd: 3 osds: 3 up (since 16m), 3 in (since 16m)
    rgw: 1 daemon active (node01)
 
  task status:
 
  data:
    pools:   6 pools, 136 pgs
    objects: 209 objects, 3.4 KiB
    usage:   3.0 GiB used, 57 GiB / 60 GiB avail
    pgs:     418/627 objects degraded (66.667%)
             89 active+undersized
             47 active+undersized+degraded

 

1. 关闭防火墙和selinux:

systemctl stop firewalld && systemctl disable firewalld
setenforce 0 && sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

2.  分别在node02、node03创建目录:

sudo -u ceph mkdir /var/lib/ceph/mon/ceph-node02
sudo -u ceph mkdir /var/lib/ceph/mon/ceph-node03

3. 分别在node02、node03 执行命令如下:

scp root@192.168.19.101:/etc/ceph/ceph.client.admin.keyring /etc/ceph/
scp root@192.168.19.101:/etc/ceph/ceph.conf /etc/ceph/

4. node02、node03修改conf文件为如下:

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 0.0.0.0/0
fsid = 33af1a28-8923-4d40-af06-90c376ed74b0
mon host = [v2:192.168.19.101:3300,v1:192.168.19.101:6789],[v2:192.168.19.102:3300,v1:192.168.19.102:6789],[v2:192.168.19.103:3300,v1:192.168.19.103:6789]
mon initial members = node01,node02,node03

5. 在node01上执行以下命令获取mon 的keyring (需要读取本机的 /etc/ceph/ceph.client.admin.keyring 认证来获取 ceph-mon 的keyring):

sudo ceph auth get mon. -o /tmp/ceph.mon.keyring

此时获取到的monr key存储在 /tmp/ceph.mon.keyring ,这个key文件实际上就是初始服务器node01 在 /var/lib/ceph/mon/node01/keyring 内容。

输出:

[root@node01 tmp]# cat ceph.mon.keyring 
[mon.]
	key = AQBQYt9kQr8SDBAA05KJPJ7wj/cojuhu6rvVWQ==
	caps mon = "allow *"

6. 获取mon monmap:

sudo ceph mon getmap -o /tmp/monmap

查询内容输出如下:

[root@node01 tmp]# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 1
fsid 33af1a28-8923-4d40-af06-90c376ed74b0
last_changed 2023-08-18 20:32:23.733383
created 2023-08-18 20:32:23.733383
min_mon_release 14 (nautilus)
0: [v2:192.168.19.101:3300/0,v1:192.168.19.101:6789/0] mon.node01

7. 但是需要注意,这里从集群中获得的 monmap 只包含了第一台服务器 node01 ,我们还需要添加增加节点node02、node03

monmaptool --addv node02 [v2:192.168.19.102:3300,v1:192.168.19.102:6789] --fsid 33af1a28-8923-4d40-af06-90c376ed74b0  /tmp/monmap
monmaptool --addv node03 [v2:192.168.19.103:3300,v1:192.168.19.103:6789] --fsid 33af1a28-8923-4d40-af06-90c376ed74b0  /tmp/monmap

查询内容输出:

[root@node01 ~]# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 1
fsid 33af1a28-8923-4d40-af06-90c376ed74b0
last_changed 2023-08-18 20:32:23.733383
created 2023-08-18 20:32:23.733383
min_mon_release 14 (nautilus)
0: [v2:192.168.19.101:3300/0,v1:192.168.19.101:6789/0] mon.node01
1: [v2:192.168.19.102:3300/0,v1:192.168.19.102:6789/0] mon.node02
2: [v2:192.168.19.103:3300/0,v1:192.168.19.103:6789/0] mon.node03

现在/tmp/monmap 是最新的最全的monmap,但是在node01上没有添加过node02和node03的 monmap ,我们需要把这个/tmp/monmap 插入到 node01 的监控目录中。所以把这个最新 /tmp/monmap 复制到 node01 上,再执行以下命令更新:

注意:更新node01的 monmap 之前,需要先通过命令:sudo systemctl stop ceph-mon@node01停止 ceph-mon否则会报错无法拿到db的锁。

sudo ceph-mon -i node01 --inject-monmap /tmp/monmap

8. 修改 /etc/ceph/cceph.conf 如上,启动 mon:

sudo systemctl start ceph-mon@node01

此时执行ceph -s无响应,启动node02、node03即可。

9. 在node02、node03分别执行如下命令:

scp root@192.168.19.101:/tmp/monmap /tmp/
scp root@192.168.19.101:/etc/ceph/ceph.client.admin.keyring /etc/ceph/

10. 在node02、node03分别执行如下:

sudo -u ceph ceph-mon --mkfs -i node02 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
sudo -u ceph ceph-mon --mkfs -i node03 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring

11. 集群查询状态:

[root@node01 ~]# ceph -s
  cluster:
    id:     33af1a28-8923-4d40-af06-90c376ed74b0
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum node01,node02,node03 (age 19m)
    mgr: node01(active, since 19m)
    mds: cephfs:1 {0=node01=up:active(laggy or crashed)}
    osd: 9 osds: 9 up (since 19m), 9 in (since 38h)
    rgw: 1 daemon active (node01)
 
  task status:
 
  data:
    pools:   7 pools, 140 pgs
    objects: 209 objects, 3.4 KiB
    usage:   9.6 GiB used, 170 GiB / 180 GiB avail
    pgs:     140 active+clean

 

参考资料

1. Ceph集群添加ceph-mon

 

posted @ 2023-01-11 09:59  苏格拉底的落泪  阅读(405)  评论(0编辑  收藏  举报