centos7安装ceph-luminous（1 mon+2 osd）

说明：由于环境有限，这里只是用一台机器

一、部署环境

VMware Workstation 10
centos7

二、主机配置

主机名	ip	cpu	ram
master	192.168.137.10	2	3G

1、在 /etc/hosts 添加以下内容：

2、关闭防火墙、selinux、swap

systemctl stop firewalld
systemctl disable firewalld

修改：vim /etc/selinux/config

3、对主机进行免密设置

1）、CentOS7默认没有启动ssh无密登录，去掉/etc/ssh/sshd_config其中1行的注释

#PubkeyAuthentication yes

然后重启ssh服务

systemctl restart sshd

2）、在master机器的/root执行：ssh-keygen -t rsa命令，一直按回车。2台机器都要执行。

[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:aMUO8b/EkylqTMb9+71ePnQv0CWQohsaMeAbMH+t87M root@master
The key's randomart image is:
+---[RSA 2048]----+
|  o ...      .   |
|   =  o=  . o    |
|    + oo=. . .   |
|     =.Boo o  . .|
|    . OoSoB  . o |
|     =.+.+ o. ...|
|      + o o  .. +|
|     .   o . ..+.|
|        E ....+oo|
+----[SHA256]-----+

3）、在master上合并公钥到authorized_keys文件

[root@master ~]# cd /root/.ssh/
[root@master .ssh]# cat id_rsa.pub>> authorized_keys

测试，master上可以用ip免密直接登录，但是用名字还需要输入一次yes，输入一次之后以后就可以了

[root@master]# ssh master
The authenticity of host 'master (192.168.137.10)' can't be established.
ECDSA key fingerprint is 5c:c6:69:04:26:65:40:7c:d0:c6:24:8d:ff:bd:5f:ef.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.137.10' (ECDSA) to the list of known hosts.
Last login: Mon Dec 10 15:34:51 2018 from 192.168.137.1

4、配置国内 yum源地址、ceph源地址

cp -r /etc/yum.repos.d/ /etc/yum-repos-d-bak
yum install -y wget
rm -rf  /etc/yum.repos.d/*
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel-7.repo http://mirrors.aliyun.com/repo/epel-7.repo
yum clean all
yum makecache

cat <<EOF > /etc/yum.repos.d/ceph.repo
[ceph]
name=Ceph packages 
baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/x86_64/
enabled=1
gpgcheck=1
priority=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
priority=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/SRPMS
enabled=0
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1
EOF

5、准备2块盘给2个osd用

三、安装ceph

1、安装ceph-deploy工具

yum install -y ceph-deploy

2、配置监控节点

ceph-deploy new master

当前工具目录会多出以下文件

3、修改ceph.conf，添加public network

[global]
fsid = d81b3ce4-bcbc-4b43-870e-430950652315
mon_initial_members = cluster9
public network = 192.168.137.0/24
mon_host = 192.168.137.10
mon allow pool delete = true
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

4、安装ceph，监视节点和osd节点都要安装，只不过，现在只有一个节点，都在master上

ceph-deploy install master

[root@master ceph]# ceph -v
ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)

5、配置初始 monitor(s)、并收集所有密钥

ceph-deploy mon create-initial

6、将配置文件和管理密钥复制到管理节点和你的Ceph的节点

ceph-deploy admin master

7、部署管理器守护程序

ceph-deploy mgr create master

8、添加osd

第一步，格式化磁盘

mkfs.xfs /dev/sda1 -f

如果出现：mkfs.xfs: cannot open /dev/sda1: Device or resource busy

通过以下命令找到被占用（不是必选，只有出现Device or resource busy）

sda                                       8:0    0   3.7T  0 disk 
└─sda1                                    8:1    0   3.7T  0 part 
  └─ceph--90f7ab20--7120--4ad7--a1a2--0cc510aa78cc-osd--block--9841f92e--9877--43af--8a93--6c798079d8c0
                                        253:4    0   3.7T  0 lvm  
sdc                                       8:32   0   3.7T  0 disk 
└─sdc1                                    8:33   0   3.7T  0 part 
  └─ceph--becd0c65--b2f7--4b47--b833--a2b141c44c51-osd--block--dd47485c--871b--41bc--862f--b2c9ebfe8684
                                        253:5    0   3.7T  0 lvm  
sdd                                       8:48   0   3.7T  0 disk 
└─sdd1                                    8:49   0   3.7T  0 part 
  └─ceph--98ae6926--4336--43ae--84d2--9bb7b99d5456-osd--block--4160272c--7e60--4cab--ba77--cabedcce2f54

删除掉文件系统不是必选，只有出现Device or resource busy）

dmsetup remove ceph--90f7ab20--7120--4ad7--a1a2--0cc510aa78cc-osd--block--9841f92e--9877--43af--8a93--6c798079d8c0

第二步：创建osd

ceph-deploy --overwrite-conf osd create --fs-type xfs --data /dev/sda1 master

ceph-deploy --overwrite-conf osd create --fs-type xfs --data /dev/sdb1 master

9、检查集群状态

[root@master ceph]# ceph -s
  cluster:
    id:     aca21f57-bcd2-4eb5-bf47-3ad1e5bb1e7a
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum master
    mgr: master(active)
    osd: 2 osds: 2 up, 2 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0B
    usage:   2.00GiB used, 5.00GiB / 7GiB avail
    pgs:

[root@master ceph]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME       STATUS REWEIGHT PRI-AFF 
-1       0.00679 root default                            
-3       0.00679     host master                         
 0   hdd 0.00389         osd.0       up  1.00000 1.00000 
 1   hdd 0.00290         osd.1       up  1.00000 1.00000

四、ceph常用命令

1、查看版本

[root@cluster9 ~]# ceph --version
ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)

2、启动、停止、重启、查看MON进程

sudo systemctl [start/stop/restart/status] ceph-mon@mon‘sid.service

[root@cluster9 ~]# systemctl status ceph-mon@cluster9.service
● ceph-mon@cluster9.service - Ceph cluster monitor daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-07-06 00:48:14 EDT; 1 weeks 1 days ago
 Main PID: 3545 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@cluster9.service
           └─3545 /usr/bin/ceph-mon -f --cluster ceph --id cluster9 --setuser ceph --setgroup ceph

Jul 14 19:53:20 cluster9.com ceph-mon[3545]: 2020-07-14 19:53:20.925077 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 48 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 20:08:15 cluster9.com ceph-mon[3545]: 2020-07-14 20:08:15.638488 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 49 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 20:23:17 cluster9.com ceph-mon[3545]: 2020-07-14 20:23:17.278952 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 50 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 20:38:15 cluster9.com ceph-mon[3545]: 2020-07-14 20:38:15.082665 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 51 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 20:53:15 cluster9.com ceph-mon[3545]: 2020-07-14 20:53:15.826228 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 52 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 21:08:17 cluster9.com ceph-mon[3545]: 2020-07-14 21:08:17.342280 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 53 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 21:23:21 cluster9.com ceph-mon[3545]: 2020-07-14 21:23:21.269260 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 54 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 21:38:15 cluster9.com ceph-mon[3545]: 2020-07-14 21:38:15.985947 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 55 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 21:53:20 cluster9.com ceph-mon[3545]: 2020-07-14 21:53:20.710455 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 56 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Jul 14 22:08:17 cluster9.com ceph-mon[3545]: 2020-07-14 22:08:17.439460 7fa1fc598700 -1 log_channel(cluster) log [ERR] : Health check update: 57 stuck requests are blocked > 4096 sec. Implicat...REQUEST_STUCK)
Hint: Some lines were ellipsized, use -l to show in full.

3、查看mon节点上启动的ceph服务

[root@cluster9 ~]# systemctl list-units --type=service|grep ceph
ceph-mgr@cluster9.service                             loaded active running Ceph cluster manager daemon
ceph-mon@cluster9.service                             loaded active running Ceph cluster monitor daemon
ceph-osd@0.service                                    loaded active running Ceph object storage daemon osd.0
ceph-osd@1.service                                    loaded active running Ceph object storage daemon osd.1
ceph-osd@2.service                                    loaded active running Ceph object storage daemon osd.2

4、查看节点上自动启的ceph服务

[root@cluster9 ~]# systemctl list-unit-files|grep enabled|grep ceph
ceph-mgr@.service                             enabled        
ceph-mon@.service                             enabled        
ceph-osd@.service                             enabled-runtime
ceph-volume@.service                          enabled        
ceph-mds.target                               enabled        
ceph-mgr.target                               enabled        
ceph-mon.target                               enabled        
ceph-osd.target                               enabled        
ceph-radosgw.target                           enabled        
ceph.target                                   enabled

5、启动、停止、重启、查看OSD所有和单个进程

systemctl [start/stop/restart/status] ceph-osd@* or eph-osd@osd_id.service

systemctl status ceph-osd@*.service

systemctl status ceph-osd@0.service

6、查看OSD node上所有OSD data目录和挂载磁盘

[root@cluster9 ~]# ls /var/lib/ceph/osd/
ceph-0  ceph-1  ceph-2

[root@cluster9 ~]#  mount |grep osd
tmpfs on /var/lib/ceph/osd/ceph-0 type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/ceph/osd/ceph-1 type tmpfs (rw,relatime,seclabel)
tmpfs on /var/lib/ceph/osd/ceph-2 type tmpfs (rw,relatime,seclabel)

7、查看集群详情状态

[root@cluster9 ~]# ceph health detail
HEALTH_WARN Reduced data availability: 84 pgs inactive; Degraded data redundancy: 128 pgs undersized; 1 slow requests are blocked > 32 sec. Implicated osds 2
PG_AVAILABILITY Reduced data availability: 84 pgs inactive
    pg 1.0 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.1 is stuck inactive for 771037.239650, current state undersized+peered, last acting [2]
    pg 1.2 is stuck inactive for 771037.239650, current state undersized+peered, last acting [0]
    pg 1.3 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.4 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.5 is stuck inactive for 771037.239650, current state undersized+peered, last acting [2]
    pg 1.6 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.7 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.8 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.9 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.a is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.b is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.c is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.d is stuck inactive for 771037.239650, current state undersized+peered, last acting [2]
    pg 1.e is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.16 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.17 is stuck inactive for 771037.239650, current state undersized+peered, last acting [0]
    pg 1.18 is stuck inactive for 771037.239650, current state undersized+peered, last acting [2]
    pg 1.19 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.1a is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.1b is stuck inactive for 771037.239650, current state undersized+peered, last acting [0]
    pg 1.1c is stuck inactive for 771037.239650, current state undersized+peered, last acting [0]
    pg 1.1d is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.1e is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.1f is stuck inactive for 771037.239650, current state undersized+peered, last acting [0]
    pg 1.20 is stuck inactive for 771037.239650, current state undersized+peered, last acting [2]
    pg 1.21 is stuck inactive for 771037.239650, current state undersized+peered, last acting [1]
    pg 1.22 is stuck inactive for 771037.239650, current state undersized+peered, last acting [2]

8、创建一个pool

ceph osd pool create k8s-volumes 128 128

ceph osd pool application enable k8s-volumes k8s-volumes

9、查看pool的详情

[root@cluster9 ~]# ceph osd pool ls detail
pool 1 'k8s-volumes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 21 lfor 0/17 flags hashpspool stripe_width 0

10、删除pool

[root@cluster9 ~]# ceph osd pool delete k8s-volumes k8s-volumes --yes-i-really-really-mean-it
pool 'k8s-volumes' removed

五、ceph pg状态详解

正常的 PG 状态是 100% 的 active + clean，这表示所有的 PG 是可访问的，所有副本都对全部 PG 都可用。
如果 Ceph 也报告 PG 的其他的警告或者错误状态。PG 状态表：

状态	描述
Activating	Peering 已经完成，PG 正在等待所有 PG 实例同步并固化 Peering 的结果 (Info、Log 等)
Active	活跃态。PG 可以正常处理来自客户端的读写请求
Backfilling	正在后台填充态。 backfill 是 recovery 的一种特殊场景，指 peering 完成后，如果基于当前权威日志无法对 Up Set 当中的某些 PG 实例实施增量同步 (例如承载这些 PG 实例的 OSD 离线太久，或者是新的 OSD 加入集群导致的 PG 实例整体迁移) 则通过完全拷贝当前 Primary 所有对象的方式进行全量同步
Backfill-toofull	某个需要被 Backfill 的 PG 实例，其所在的 OSD 可用空间不足，Backfill 流程当前被挂起
Backfill-wait	等待 Backfill 资源预留
Clean	干净态。PG 当前不存在待修复的对象， Acting Set 和 Up Set 内容一致，并且大小等于存储池的副本数
Creating	PG 正在被创建
Deep	PG 正在或者即将进行对象一致性扫描清洗
Degraded	降级状态。Peering 完成后，PG 检测到任意一个 PG 实例存在不一致 (需要被同步 / 修复) 的对象，或者当前 ActingSet 小于存储池副本数
Down	Peering 过程中，PG 检测到某个不能被跳过的 Interval 中 (例如该 Interval 期间，PG 完成了 Peering，并且成功切换至 Active 状态，从而有可能正常处理了来自客户端的读写请求), 当前剩余在线的 OSD 不足以完成数据修复
Incomplete	Peering 过程中，由于 a. 无非选出权威日志 b. 通过 choose_acting 选出的 Acting Set 后续不足以完成数据修复，导致 Peering 无非正常完成
Inconsistent	不一致态。集群清理和深度清理后检测到 PG 中的对象在副本存在不一致，例如对象的文件大小不一致或 Recovery 结束后一个对象的副本丢失
Peered	Peering 已经完成，但是 PG 当前 ActingSet 规模小于存储池规定的最小副本数 (min_size)
Peering	正在同步态。PG 正在执行同步处理
Recovering	正在恢复态。集群正在执行迁移或同步对象和他们的副本
Recovering-wait	等待 Recovery 资源预留
Remapped	重新映射态。PG 活动集任何的一个改变，数据发生从老活动集到新活动集的迁移。在迁移期间还是用老的活动集中的主 OSD 处理客户端请求，一旦迁移完成新活动集中的主 OSD 开始处理
Repair	PG 在执行 Scrub 过程中，如果发现存在不一致的对象，并且能够修复，则自动进行修复状态
Scrubbing	PG 正在或者即将进行对象一致性扫描
Unactive	非活跃态。PG 不能处理读写请求
Unclean	非干净态。PG 不能从上一个失败中恢复
Stale	未刷新态。PG 状态没有被任何 OSD 更新，这说明所有存储这个 PG 的 OSD 可能挂掉, 或者 Mon 没有检测到 Primary 统计信息 (网络抖动)
Undersized	PG 当前 Acting Set 小于存储池副本数

posted on 2019-03-13 21:46 波神阅读(858) 评论(0) 编辑收藏举报

刷新页面返回顶部

波神

导航

公告

centos7安装ceph-luminous（1 mon+2 osd）