zabbix监控hp服务器raid硬盘健康情况

解决HP服务器raid坏的问题.相当于每天巡检一回硬盘状态,以便于及时发现处理硬盘的问题

系统环境为centos7.9

 

下载地址: https://downloads.linux.hpe.com/SDR/repo/mcp/centos/7/x86_64/current/

  wget https://downloads.linux.hpe.com/SDR/repo/mcp/centos/7/x86_64/current/ssacli-5.10-44.0.x86_64.rpm

安装软件hp官方的软件  

rpm -ivh ssacli-5.10-44.0.x86_64.rpm

也可以直接用下面的命令安装

  rpm -ivh https://downloads.linux.hpe.com/SDR/repo/mcp/centos/7/x86_64/current/ssacli-5.10-44.0.x86_64.rpm

查看硬盘状态命令

  ssacli ctrl slot=0 pd all show status
   
  physicaldrive 1I:3:1 (port 1I:box 3:bay 1, 300 GB): OK
  physicaldrive 1I:3:2 (port 1I:box 3:bay 2, 300 GB): OK
  physicaldrive 1I:3:3 (port 1I:box 3:bay 3, 300 GB): OK
  physicaldrive 1I:3:4 (port 1I:box 3:bay 4, 300 GB): OK

配置zabbix_aget.conf

  echo "UserParameter=diskbad.count[*],/usr/local/sbin/diskinfo_z.sh" >> /etc/zabbix/zabbix_agentd.conf

增加每天检测的计划任务

  01 01 * * * /usr/sbin/ssacli ctrl slot=0 pd all show status | grep Failed|wc -l > /usr/local/sbin/diskinfo.txt

增加检测脚本

  cat > /usr/local/sbin/diskinfo_z.sh <<EOF
  #!/bin/bash
  cat /usr/local/sbin/diskinfo.txt
  EOF

给脚本增加可执行权限

  chmod +x /usr/local/sbin/diskinfo_z.sh

配置zabbix监控项

配置触发器

重起一下客户端服务

  systemctl restart zabbix-agent

测试效果

  echo 1 > /usr/local/sbin/diskinfo.txt

 

钉钉群收到报警测试成功

 

附监控模板  zbx_export_templates.xml

  <?xml version="1.0" encoding="UTF-8"?>
  <zabbix_export>
  <version>5.0</version>
  <date>2023-03-23T08:00:53Z</date>
  <groups>
  <group>
  <name>Linux servers</name>
  </group>
  </groups>
  <templates>
  <template>
  <template>HPdiskbakcount</template>
  <name>HP服务器硬盘健康状态监控</name>
  <description>HP服务器硬盘健康状态监控</description>
  <groups>
  <group>
  <name>Linux servers</name>
  </group>
  </groups>
  <items>
  <item>
  <name>diskbadcount</name>
  <key>diskbad.count</key>
  <description>HP服务器硬盘健康度监控,一天更新一次数据</description>
  <valuemap>
  <name>diskbadcount</name>
  </valuemap>
  <triggers>
  <trigger>
  <expression>{last(,30)}&lt;&gt;0</expression>
  <name>物理硬盘状态异常,请尽快检查处理</name>
  <priority>HIGH</priority>
  <description>物理硬盘状态异常</description>
  <manual_close>YES</manual_close>
  </trigger>
  </triggers>
  </item>
  </items>
  </template>
  </templates>
  <graphs>
  <graph>
  <name>HP服务器硬盘健康度</name>
  <graph_items>
  <graph_item>
  <sortorder>1</sortorder>
  <color>1A7C11</color>
  <item>
  <host>HPdiskbakcount</host>
  <key>diskbad.count</key>
  </item>
  </graph_item>
  </graph_items>
  </graph>
  </graphs>
  <value_maps>
  <value_map>
  <name>diskbadcount</name>
  <mappings>
  <mapping>
  <value>0</value>
  <newvalue>OK</newvalue>
  </mapping>
  <mapping>
  <value>1</value>
  <newvalue>Failed1</newvalue>
  </mapping>
  <mapping>
  <value>2</value>
  <newvalue>Failed2</newvalue>
  </mapping>
  </mappings>
  </value_map>
  </value_maps>
  </zabbix_export>
posted @ 2023-11-04 23:19  呼长喜  阅读(376)  评论(0编辑  收藏  举报