启用ceph dashboard及并通过prometheus 监控ceph集群状态

  ceph dashboard

  Dashboard介绍

  Ceph dashboard 是通过一个 web 界面,对已经运行的 ceph 集群进行状态查看及功能配置等功能,早期 ceph 使用的是第三方的 dashboard 组件

 

  启用 dashboard 插件

 https://docs.ceph.com/en/mimic/mgr/

   https://docs.ceph.com/en/latest/mgr/dashboard/

   https://packages.debian.org/unstable/ceph-mgr-dashboard 15 版本有依赖需要单独解决

 

  Ceph mgr 是一个多插件(模块化)的组件,其组件可以单独的启用或关闭,以下为在 ceph-deploy 服务器操作:

  新版本需要安装 dashboard,而且必须安装在 mgr 节点,否则报错如下:

The following packages have unmet dependencies:

ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1~bpo10+1) but it is not going to be installed

E: Unable to correct problems, you have held broken packages.

 

  ceph-mgr 节点安装 ceph-mgr-dashboard

root@ceph-mgr1:~# apt-cache madison ceph-mgr-dashboard
ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main amd64 Packages
ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main i386 Packages

root@ceph-mgr1:~# apt install ceph-mgr-dashboard
root@ceph-mgr2:~# apt install ceph-mgr-dashboard

 

  ceph-deploy节点操作

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module -h   #查看ceph mgr module 帮助

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls   #列巨额所有 ceph mgr 模块
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [    #已经开启的模块,可以看出没有启动dashboard模块
        "iostat",
        "nfs",
        "restful"
    ],
    "disabled_modules": [    #已关闭的模块
        {
            "name": "alerts",
            "can_run": true,    #是否可以启用
            "error_string": "",
            "module_options": {
                "interval": {
                    "name": "interval",
                    "type": "secs",
                    "level": "advanced",

 

  启用dashboard模块

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard

  注:模块启用后还不能直接访问,需要配置关闭 SSL 或启用 SSL 及指定监听地址。

 

  配置 dashboard 模块

  配置Ceph dashboard 关闭 SSL,如下:

#禁用 dashboard 的 ssl
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl false

 

  配置方法一:Ceph dashboard 可以只对 mgr1 节点进行开启设置

#指定 dashboard 的监听地址为其中一个 mgr节点的ip
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 172.16.100.38

#指定 dashboard 的 在 mgr1 节点上监听的端口为 9009
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9009

 

  配置方法二(推荐):设置多个mgr监听,如果 mgr 1172.16.100.38 节点mgr服务宕机,则可以在其他 mgr 节点访问dashboard,做到 dashboard 的高可用

#指定 dashboard 的监听地址为其中一个 mgr节点的ip
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_addr 172.16.100.38

#指定 dashboard 的 在 mgr1 节点上监听的端口为 9009
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_port 9009

 

  这里使用的是方法二的配置。配置完成后,重启模块,加载配置

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard

 

  检查ceph状态

cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
  cluster:
    id:     5372c074-edf7-45dd-b635-16422165c17c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 107m)
    mgr: ceph-mgr2(active, since 18m), standbys: ceph-mgr1
    mds: 2/2 daemons up, 2 standby
    osd: 20 osds: 20 up (since 6h), 20 in (since 7d)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 337 pgs
    objects: 304 objects, 68 MiB
    usage:   1.1 GiB used, 2.0 TiB / 2.0 TiB avail
    pgs:     337 active+clean

  

  如果有以下报错:需要检查 mgr 服务是否正常运行,可以重启一遍 mgr 服务

Module 'dashboard' has failed: error('No socket could be created',)

  

  第一次启用 dashboard 插件需要等一段时间(几分钟),再去被启用的 mgr1 节点验证。

  如果长时间等待 mgr1 节点并没哟监听 9009的服务,那么需要手动重启 mgr 服务

root@ceph-mgr1:~# ss -lntup|grep 9009
root@ceph-mgr1:~# systemctl restart ceph-mgr@ceph-mgr1.service

 

  如果重启ceph-mgr@ceph-mgr1.service报错:

Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Start request repeated too quickly.
Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Failed with result 'start-limit-hit'.
Dec 21 16:23:50 ceph-mgr1 systemd[1]: Failed to start Ceph cluster manager daemon.

  修改ceph-mgr.target.service启动文件,注释启动时间间隔

root@ceph-mgr1:/var/log/ceph# vim /lib/systemd/system/ceph-mgr@.servic
#StartLimitInterval=30min

 

  浏览器访问:mgr1节点ip 172.16.100.38:9009

 

  关闭 mgr1 节点 mgr 服务,验证dashboard的高可用

root@ceph-mgr1:~# lsof -i :9009
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ceph-mgr 16144 ceph   32u  IPv4 159123      0t0  TCP ceph-mgr1.example.local:9009 (LISTEN)

root@ceph-mgr1:~# systemctl stop ceph-mgr@ceph-mgr1.service 

root@ceph-mgr1:~# lsof -i :9009

root@ceph-mgr1:~# ceph -s

 

 

  查看 mgr2 节点 dashboard 端口的监听,并访问 mgr2节点 172.16.100.39:9009

 

 

  成功访问。

 

  设置 dashboard 账户及密码

  方法1(推荐):指定文件进行设置

cephadmin@ceph-deploy:~/ceph-cluster$ touch pass.txt

cephadmin@ceph-deploy:~/ceph-cluster$ echo "123456" > pass.txt 

cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh -i pass.txt 
******************************************************************
***          WARNING: this command is deprecated.              ***
*** Please use the ac-user-* related commands to manage users. ***
******************************************************************
Username and password updated

 

  方法2:直接指定用户名和密码

  在 ceph pacific 16.x 版本已经启用此方法

  命令格式:

Dashboard set-login-credentials <username> <password>    Set the login credentials
 创建用户并生成密码
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh 123456

 

  dashboard HTTPS SSL 配置

  如果要使用 SSL 访问。则需要配置签名证书。证书可以使用 ceph 命令生成,或是 opessl 命令生成。生成建议使用 nginx 反向代理,并在 nginx上 配置 https

  https://docs.ceph.com/en/latest/mgr/dashboard/

 

  ceph 自签名证书

  1、使用 ceph dashboard 创建自签名证书

cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard create-self-signed-cert 
Self-signed certificate created

 

  2、开启 dashboard ssl协议,并设置 ssl https 端口为 9443,默认为8443

cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl true
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl_server_port 9443

 

  3、重启模块,加载配置

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard

 

  4、查看mgr dashboard状态

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services
{
    "dashboard": "https://172.16.100.39:9443/"
}

 

  5、查看ceph状态

cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
  cluster:
    id:     5372c074-edf7-45dd-b635-16422165c17c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 94m)
    mgr: ceph-mgr2(active, since 115s), standbys: ceph-mgr1
    mds: 2/2 daemons up, 2 standby
    osd: 20 osds: 20 up (since 9h), 20 in (since 7d)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 337 pgs
    objects: 306 objects, 68 MiB
    usage:   1.2 GiB used, 2.0 TiB / 2.0 TiB avail
    pgs:     337 active+clean

 

  6、浏览器访问验证:

 

 

  ceph 监控

 

  通过 prometheus 监控 ceph node 节点

   https://prometheus.io/

  部署node_exporter

  ceph 集群各个 node 节点部署 node_exporter

root@ceph-node1:/usr/local# tar xf node_exporter-1.3.1.linux-amd64.tar.gz
root@ceph-node1:/usr/local# mv node_exporter-1.3.1.linux-amd64 node_exporter

 

  创建启动文件

root@ceph-node1:/usr/local# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
ExecStart=/usr/local/node_exporter/node_exporter

[Install]
WantedBy=multi-user.target

 

  启动node_exporter

root@ceph-node1:/usr/local# systemctl daemon-reload && systemctl restart node-exporter && systemctl enable node-exporter.service

 

  验证各个节点 node_exporter

 

  prometheus server 采集node-exporter

  添加 ceph-node 节点采集任务

root@prometheus:/usr/local/prometheus# vim prometheus.yml
- job_name: "ceph-node" 
    static_configs:
      - targets: ["172.16.100.31:9100","172.16.100.32:9100","172.16.100.33:9100","172.16.100.34:9100"]

 

#重启prometheus
root@prometheus:/usr/local/prometheus\# systemctl restart prometheus

 

  prometheus server 验证

 

  通过 prometheus 监控 ceph 服务

  Ceph manager 内部的模块中包含了 prometheus 的监控模块,并监听在每个 manager 节点的 9283 端口,该端口用于将采集到的信息通过 http 接口向 prometheus 提供数据。https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus

 

  启用 prometheus 监控模块

  开启 mgr 节点 prometheus监控模块

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable prometheus

 

  验证模块开启

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls |less
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [
        "dashboard",
        "iostat",
        "nfs",
        "prometheus",
        "restful"
    ],
    "disabled_modules": [
        {
            "name": "alerts",
            "can_run": true,
            "error_string": "",
            "module_options": {
                "interval": {
                    "name": "interval",
                    "type": "secs",

 

  验证 mgr 节点端口监听

root@ceph-mgr1:~# ss -lntup | grep 9283
tcp   LISTEN  0       5                          *:9283                 *:*      users:(("ceph-mgr",pid=1247,fd=36))  

 

  浏览器访问 mgr 指标

 

 

  配置 prometheus 采集数据

  添加 mgr 节点 metrics 采集任务

root@prometheus:/usr/local/prometheus# vim prometheus.yml 
- job_name: "ceph-mgr"
    static_configs:
      - targets: ["172.168.100.38:9283"]

#重启prometheus
root@prometheus:/usr/local/prometheus# systemctl restart prometheus

 

   prometheus server 验证

 

  通过 grafana 显示监控数据

  通过 granfana 显示对 ceph 的集群监控数据及 node 数据

 

 

  配置数据源

  在 grafana 添加 采集 ceph集群的 prometheus 数据源

 

 

 

  导入模板

  1、导入ceph OSD 模板:https://grafana.com/grafana/dashboards/5336

 

  导入模板

  1、导入ceph OSD 模板:https://grafana.com/grafana/dashboards/5336

 

  

  如遇首页三个指标无数据,修改 value mappings范围值,修改 N/A 范围 为0 即可。

 

   2、导入ceph pool 模板

  ceph-pool https://grafana.com/grafana/dashboards/5342

 

 

 

  3、导入集群模板

  ceph cluster https://grafana.com/grafana/dashboards/7056

 

 

 

  10045:https://grafana.com/grafana/dashboards/10045-ceph-cluster/

 

  9966:https://grafana.com/grafana/dashboards/9966-ceph-multicluster

 

posted @ 2023-01-29 17:10  PunchLinux  阅读(2998)  评论(0编辑  收藏  举报