启用ceph dashboard及并通过prometheus 监控ceph集群状态
ceph dashboard
Dashboard介绍
Ceph dashboard 是通过一个 web 界面,对已经运行的 ceph 集群进行状态查看及功能配置等功能,早期 ceph 使用的是第三方的 dashboard 组件
启用 dashboard 插件
https://docs.ceph.com/en/mimic/mgr/
https://docs.ceph.com/en/latest/mgr/dashboard/
https://packages.debian.org/unstable/ceph-mgr-dashboard 15 版本有依赖需要单独解决
Ceph mgr 是一个多插件(模块化)的组件,其组件可以单独的启用或关闭,以下为在 ceph-deploy 服务器操作:
新版本需要安装 dashboard,而且必须安装在 mgr 节点,否则报错如下:
The following packages have unmet dependencies:
ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1~bpo10+1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
ceph-mgr 节点安装 ceph-mgr-dashboard
root@ceph-mgr1:~# apt-cache madison ceph-mgr-dashboard
ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main amd64 Packages
ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main i386 Packages
root@ceph-mgr1:~# apt install ceph-mgr-dashboard
root@ceph-mgr2:~# apt install ceph-mgr-dashboard
ceph-deploy节点操作
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module -h #查看ceph mgr module 帮助
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls #列巨额所有 ceph mgr 模块
{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [ #已经开启的模块,可以看出没有启动dashboard模块
"iostat",
"nfs",
"restful"
],
"disabled_modules": [ #已关闭的模块
{
"name": "alerts",
"can_run": true, #是否可以启用
"error_string": "",
"module_options": {
"interval": {
"name": "interval",
"type": "secs",
"level": "advanced",
启用dashboard模块
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard
注:模块启用后还不能直接访问,需要配置关闭 SSL 或启用 SSL 及指定监听地址。
配置 dashboard 模块
配置Ceph dashboard 关闭 SSL,如下:
#禁用 dashboard 的 ssl
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl false
配置方法一:Ceph dashboard 可以只对 mgr1 节点进行开启设置
#指定 dashboard 的监听地址为其中一个 mgr节点的ip
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 172.16.100.38
#指定 dashboard 的 在 mgr1 节点上监听的端口为 9009
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9009
配置方法二(推荐):设置多个mgr监听,如果 mgr 1172.16.100.38 节点mgr服务宕机,则可以在其他 mgr 节点访问dashboard,做到 dashboard 的高可用
#指定 dashboard 的监听地址为其中一个 mgr节点的ip
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_addr 172.16.100.38
#指定 dashboard 的 在 mgr1 节点上监听的端口为 9009
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_port 9009
这里使用的是方法二的配置。配置完成后,重启模块,加载配置
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard
检查ceph状态
cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
cluster:
id: 5372c074-edf7-45dd-b635-16422165c17c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 107m)
mgr: ceph-mgr2(active, since 18m), standbys: ceph-mgr1
mds: 2/2 daemons up, 2 standby
osd: 20 osds: 20 up (since 6h), 20 in (since 7d)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 337 pgs
objects: 304 objects, 68 MiB
usage: 1.1 GiB used, 2.0 TiB / 2.0 TiB avail
pgs: 337 active+clean
如果有以下报错:需要检查 mgr 服务是否正常运行,可以重启一遍 mgr 服务
Module 'dashboard' has failed: error('No socket could be created',)
第一次启用 dashboard 插件需要等一段时间(几分钟),再去被启用的 mgr1 节点验证。
如果长时间等待 mgr1 节点并没哟监听 9009的服务,那么需要手动重启 mgr 服务
root@ceph-mgr1:~# ss -lntup|grep 9009
root@ceph-mgr1:~# systemctl restart ceph-mgr@ceph-mgr1.service
如果重启ceph-mgr@ceph-mgr1.service报错:
Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Start request repeated too quickly.
Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Failed with result 'start-limit-hit'.
Dec 21 16:23:50 ceph-mgr1 systemd[1]: Failed to start Ceph cluster manager daemon.
修改ceph-mgr.target.service启动文件,注释启动时间间隔
root@ceph-mgr1:/var/log/ceph# vim /lib/systemd/system/ceph-mgr@.servic
#StartLimitInterval=30min
浏览器访问:mgr1节点ip 172.16.100.38:9009
关闭 mgr1 节点 mgr 服务,验证dashboard的高可用
root@ceph-mgr1:~# lsof -i :9009
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ceph-mgr 16144 ceph 32u IPv4 159123 0t0 TCP ceph-mgr1.example.local:9009 (LISTEN)
root@ceph-mgr1:~# systemctl stop ceph-mgr@ceph-mgr1.service
root@ceph-mgr1:~# lsof -i :9009
root@ceph-mgr1:~# ceph -s
查看 mgr2 节点 dashboard 端口的监听,并访问 mgr2节点 172.16.100.39:9009
成功访问。
设置 dashboard 账户及密码
方法1(推荐):指定文件进行设置
cephadmin@ceph-deploy:~/ceph-cluster$ touch pass.txt
cephadmin@ceph-deploy:~/ceph-cluster$ echo "123456" > pass.txt
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh -i pass.txt
******************************************************************
*** WARNING: this command is deprecated. ***
*** Please use the ac-user-* related commands to manage users. ***
******************************************************************
Username and password updated
方法2:直接指定用户名和密码
在 ceph pacific 16.x 版本已经启用此方法
命令格式:
Dashboard set-login-credentials <username> <password> Set the login credentials
创建用户并生成密码
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh 123456
dashboard HTTPS SSL 配置
如果要使用 SSL 访问。则需要配置签名证书。证书可以使用 ceph 命令生成,或是 opessl 命令生成。生成建议使用 nginx 反向代理,并在 nginx上 配置 https
https://docs.ceph.com/en/latest/mgr/dashboard/
ceph 自签名证书
1、使用 ceph dashboard 创建自签名证书
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard create-self-signed-cert
Self-signed certificate created
2、开启 dashboard ssl协议,并设置 ssl https 端口为 9443,默认为8443
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl true
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl_server_port 9443
3、重启模块,加载配置
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard
4、查看mgr dashboard状态
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services
{
"dashboard": "https://172.16.100.39:9443/"
}
5、查看ceph状态
cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
cluster:
id: 5372c074-edf7-45dd-b635-16422165c17c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 94m)
mgr: ceph-mgr2(active, since 115s), standbys: ceph-mgr1
mds: 2/2 daemons up, 2 standby
osd: 20 osds: 20 up (since 9h), 20 in (since 7d)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 337 pgs
objects: 306 objects, 68 MiB
usage: 1.2 GiB used, 2.0 TiB / 2.0 TiB avail
pgs: 337 active+clean
6、浏览器访问验证:
ceph 监控
通过 prometheus 监控 ceph node 节点
部署node_exporter
ceph 集群各个 node 节点部署 node_exporter
root@ceph-node1:/usr/local# tar xf node_exporter-1.3.1.linux-amd64.tar.gz
root@ceph-node1:/usr/local# mv node_exporter-1.3.1.linux-amd64 node_exporter
创建启动文件
root@ceph-node1:/usr/local# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
启动node_exporter
root@ceph-node1:/usr/local# systemctl daemon-reload && systemctl restart node-exporter && systemctl enable node-exporter.service
验证各个节点 node_exporter
prometheus server 采集node-exporter
添加 ceph-node 节点采集任务
root@prometheus:/usr/local/prometheus# vim prometheus.yml
- job_name: "ceph-node"
static_configs:
- targets: ["172.16.100.31:9100","172.16.100.32:9100","172.16.100.33:9100","172.16.100.34:9100"]
#重启prometheus
root@prometheus:/usr/local/prometheus\# systemctl restart prometheus
prometheus server 验证
通过 prometheus 监控 ceph 服务
Ceph manager 内部的模块中包含了 prometheus 的监控模块,并监听在每个 manager 节点的 9283 端口,该端口用于将采集到的信息通过 http 接口向 prometheus 提供数据。https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus
启用 prometheus 监控模块
开启 mgr 节点 prometheus监控模块
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable prometheus
验证模块开启
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls |less
{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"dashboard",
"iostat",
"nfs",
"prometheus",
"restful"
],
"disabled_modules": [
{
"name": "alerts",
"can_run": true,
"error_string": "",
"module_options": {
"interval": {
"name": "interval",
"type": "secs",
验证 mgr 节点端口监听
root@ceph-mgr1:~# ss -lntup | grep 9283
tcp LISTEN 0 5 *:9283 *:* users:(("ceph-mgr",pid=1247,fd=36))
浏览器访问 mgr 指标
配置 prometheus 采集数据
添加 mgr 节点 metrics 采集任务
root@prometheus:/usr/local/prometheus# vim prometheus.yml
- job_name: "ceph-mgr"
static_configs:
- targets: ["172.168.100.38:9283"]
#重启prometheus
root@prometheus:/usr/local/prometheus# systemctl restart prometheus
prometheus server 验证
通过 grafana 显示监控数据
通过 granfana 显示对 ceph 的集群监控数据及 node 数据
配置数据源
在 grafana 添加 采集 ceph集群的 prometheus 数据源
导入模板
1、导入ceph OSD 模板:https://grafana.com/grafana/dashboards/5336
导入模板
1、导入ceph OSD 模板:https://grafana.com/grafana/dashboards/5336
如遇首页三个指标无数据,修改 value mappings范围值,修改 N/A 范围 为0 即可。
2、导入ceph pool 模板
ceph-pool https://grafana.com/grafana/dashboards/5342
3、导入集群模板
ceph cluster https://grafana.com/grafana/dashboards/7056
10045:https://grafana.com/grafana/dashboards/10045-ceph-cluster/
9966:https://grafana.com/grafana/dashboards/9966-ceph-multicluster
本文来自博客园,作者:PunchLinux,转载请注明原文链接:https://www.cnblogs.com/punchlinux/p/17073011.html