启用ceph dashboard及并通过prometheus 监控ceph集群状态
ceph dashboard
Dashboard介绍
Ceph dashboard 是通过一个 web 界面,对已经运行的 ceph 集群进行状态查看及功能配置等功能,早期 ceph 使用的是第三方的 dashboard 组件
启用 dashboard 插件
https://docs.ceph.com/en/mimic/mgr/
https://docs.ceph.com/en/latest/mgr/dashboard/
https://packages.debian.org/unstable/ceph-mgr-dashboard 15 版本有依赖需要单独解决
Ceph mgr 是一个多插件(模块化)的组件,其组件可以单独的启用或关闭,以下为在 ceph-deploy 服务器操作:
新版本需要安装 dashboard,而且必须安装在 mgr 节点,否则报错如下:
The following packages have unmet dependencies: ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1~bpo10+1) but it is not going to be installed E: Unable to correct problems, you have held broken packages.
ceph-mgr 节点安装 ceph-mgr-dashboard
root@ceph-mgr1:~# apt-cache madison ceph-mgr-dashboard ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main amd64 Packages ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main i386 Packages root@ceph-mgr1:~# apt install ceph-mgr-dashboard root@ceph-mgr2:~# apt install ceph-mgr-dashboard
ceph-deploy节点操作
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module -h #查看ceph mgr module 帮助 cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls #列巨额所有 ceph mgr 模块 { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ #已经开启的模块,可以看出没有启动dashboard模块 "iostat", "nfs", "restful" ], "disabled_modules": [ #已关闭的模块 { "name": "alerts", "can_run": true, #是否可以启用 "error_string": "", "module_options": { "interval": { "name": "interval", "type": "secs", "level": "advanced",
启用dashboard模块
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard
注:模块启用后还不能直接访问,需要配置关闭 SSL 或启用 SSL 及指定监听地址。
配置 dashboard 模块
配置Ceph dashboard 关闭 SSL,如下:
#禁用 dashboard 的 ssl cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl false
配置方法一:Ceph dashboard 可以只对 mgr1 节点进行开启设置
#指定 dashboard 的监听地址为其中一个 mgr节点的ip cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 172.16.100.38 #指定 dashboard 的 在 mgr1 节点上监听的端口为 9009 cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9009
配置方法二(推荐):设置多个mgr监听,如果 mgr 1172.16.100.38 节点mgr服务宕机,则可以在其他 mgr 节点访问dashboard,做到 dashboard 的高可用
#指定 dashboard 的监听地址为其中一个 mgr节点的ip cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_addr 172.16.100.38 #指定 dashboard 的 在 mgr1 节点上监听的端口为 9009 cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_port 9009
这里使用的是方法二的配置。配置完成后,重启模块,加载配置
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard
检查ceph状态
cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s cluster: id: 5372c074-edf7-45dd-b635-16422165c17c health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 107m) mgr: ceph-mgr2(active, since 18m), standbys: ceph-mgr1 mds: 2/2 daemons up, 2 standby osd: 20 osds: 20 up (since 6h), 20 in (since 7d) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 337 pgs objects: 304 objects, 68 MiB usage: 1.1 GiB used, 2.0 TiB / 2.0 TiB avail pgs: 337 active+clean
如果有以下报错:需要检查 mgr 服务是否正常运行,可以重启一遍 mgr 服务
Module 'dashboard' has failed: error('No socket could be created',)
第一次启用 dashboard 插件需要等一段时间(几分钟),再去被启用的 mgr1 节点验证。
如果长时间等待 mgr1 节点并没哟监听 9009的服务,那么需要手动重启 mgr 服务
root@ceph-mgr1:~# ss -lntup|grep 9009 root@ceph-mgr1:~# systemctl restart ceph-mgr@ceph-mgr1.service
如果重启ceph-mgr@ceph-mgr1.service报错:
Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Start request repeated too quickly. Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Failed with result 'start-limit-hit'. Dec 21 16:23:50 ceph-mgr1 systemd[1]: Failed to start Ceph cluster manager daemon.
修改ceph-mgr.target.service启动文件,注释启动时间间隔
root@ceph-mgr1:/var/log/ceph# vim /lib/systemd/system/ceph-mgr@.servic #StartLimitInterval=30min
浏览器访问:mgr1节点ip 172.16.100.38:9009
关闭 mgr1 节点 mgr 服务,验证dashboard的高可用
root@ceph-mgr1:~# lsof -i :9009 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-mgr 16144 ceph 32u IPv4 159123 0t0 TCP ceph-mgr1.example.local:9009 (LISTEN) root@ceph-mgr1:~# systemctl stop ceph-mgr@ceph-mgr1.service root@ceph-mgr1:~# lsof -i :9009 root@ceph-mgr1:~# ceph -s
查看 mgr2 节点 dashboard 端口的监听,并访问 mgr2节点 172.16.100.39:9009
成功访问。
设置 dashboard 账户及密码
方法1(推荐):指定文件进行设置
cephadmin@ceph-deploy:~/ceph-cluster$ touch pass.txt cephadmin@ceph-deploy:~/ceph-cluster$ echo "123456" > pass.txt cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh -i pass.txt ****************************************************************** *** WARNING: this command is deprecated. *** *** Please use the ac-user-* related commands to manage users. *** ****************************************************************** Username and password updated
方法2:直接指定用户名和密码
在 ceph pacific 16.x 版本已经启用此方法
命令格式:
Dashboard set-login-credentials <username> <password> Set the login credentials 创建用户并生成密码 cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh 123456
dashboard HTTPS SSL 配置
如果要使用 SSL 访问。则需要配置签名证书。证书可以使用 ceph 命令生成,或是 opessl 命令生成。生成建议使用 nginx 反向代理,并在 nginx上 配置 https
https://docs.ceph.com/en/latest/mgr/dashboard/
ceph 自签名证书
1、使用 ceph dashboard 创建自签名证书
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard create-self-signed-cert Self-signed certificate created
2、开启 dashboard ssl协议,并设置 ssl https 端口为 9443,默认为8443
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl true cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl_server_port 9443
3、重启模块,加载配置
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard
4、查看mgr dashboard状态
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services { "dashboard": "https://172.16.100.39:9443/" }
5、查看ceph状态
cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s cluster: id: 5372c074-edf7-45dd-b635-16422165c17c health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 94m) mgr: ceph-mgr2(active, since 115s), standbys: ceph-mgr1 mds: 2/2 daemons up, 2 standby osd: 20 osds: 20 up (since 9h), 20 in (since 7d) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 337 pgs objects: 306 objects, 68 MiB usage: 1.2 GiB used, 2.0 TiB / 2.0 TiB avail pgs: 337 active+clean
6、浏览器访问验证:
ceph 监控
通过 prometheus 监控 ceph node 节点
部署node_exporter
ceph 集群各个 node 节点部署 node_exporter
root@ceph-node1:/usr/local# tar xf node_exporter-1.3.1.linux-amd64.tar.gz root@ceph-node1:/usr/local# mv node_exporter-1.3.1.linux-amd64 node_exporter
创建启动文件
root@ceph-node1:/usr/local# vim /etc/systemd/system/node-exporter.service [Unit] Description=Prometheus Node Exporter After=network.target [Service] ExecStart=/usr/local/node_exporter/node_exporter [Install] WantedBy=multi-user.target
启动node_exporter
root@ceph-node1:/usr/local# systemctl daemon-reload && systemctl restart node-exporter && systemctl enable node-exporter.service
验证各个节点 node_exporter
prometheus server 采集node-exporter
添加 ceph-node 节点采集任务
root@prometheus:/usr/local/prometheus# vim prometheus.yml - job_name: "ceph-node" static_configs: - targets: ["172.16.100.31:9100","172.16.100.32:9100","172.16.100.33:9100","172.16.100.34:9100"]
#重启prometheus root@prometheus:/usr/local/prometheus\# systemctl restart prometheus
prometheus server 验证
通过 prometheus 监控 ceph 服务
Ceph manager 内部的模块中包含了 prometheus 的监控模块,并监听在每个 manager 节点的 9283 端口,该端口用于将采集到的信息通过 http 接口向 prometheus 提供数据。https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus
启用 prometheus 监控模块
开启 mgr 节点 prometheus监控模块
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable prometheus
验证模块开启
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls |less { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "dashboard", "iostat", "nfs", "prometheus", "restful" ], "disabled_modules": [ { "name": "alerts", "can_run": true, "error_string": "", "module_options": { "interval": { "name": "interval", "type": "secs",
验证 mgr 节点端口监听
root@ceph-mgr1:~# ss -lntup | grep 9283 tcp LISTEN 0 5 *:9283 *:* users:(("ceph-mgr",pid=1247,fd=36))
浏览器访问 mgr 指标
配置 prometheus 采集数据
添加 mgr 节点 metrics 采集任务
root@prometheus:/usr/local/prometheus# vim prometheus.yml - job_name: "ceph-mgr" static_configs: - targets: ["172.168.100.38:9283"] #重启prometheus root@prometheus:/usr/local/prometheus# systemctl restart prometheus
prometheus server 验证
通过 grafana 显示监控数据
通过 granfana 显示对 ceph 的集群监控数据及 node 数据
配置数据源
在 grafana 添加 采集 ceph集群的 prometheus 数据源
导入模板
1、导入ceph OSD 模板:https://grafana.com/grafana/dashboards/5336
导入模板
1、导入ceph OSD 模板:https://grafana.com/grafana/dashboards/5336
如遇首页三个指标无数据,修改 value mappings范围值,修改 N/A 范围 为0 即可。
2、导入ceph pool 模板
ceph-pool https://grafana.com/grafana/dashboards/5342
3、导入集群模板
ceph cluster https://grafana.com/grafana/dashboards/7056
10045:https://grafana.com/grafana/dashboards/10045-ceph-cluster/
9966:https://grafana.com/grafana/dashboards/9966-ceph-multicluster
本文来自博客园,作者:PunchLinux,转载请注明原文链接:https://www.cnblogs.com/punchlinux/p/17073011.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示