Openshift平台组件监控

Docker

Docker 是Openshift最基本的组件. 需要master与node实例全局的docker健康情况 ,以下是每个节点应该监控的:

Check Name

Description

Storage Driver

Sample Alerting Logic

Docker Daemon

Check that docker is running on a system

devicemapper

systemctl is-active docker

overlay2

systemctl is-active docker

Docker Storage

Check that docker’s storage has adequate space. overlay2 check assumes LV_Name is dockerlv and VG is dockervg.

devicemapper

echo $(echo \"$(docker info 2>/dev/null | awk '/Data Space Available/ {print $4}') / $(docker info 2>/dev/null | awk '/Data Space Total/ {print $4}')\" | bc -l) '>' 0.3 | bc -l

overlay2

echo "$(df -h | awk '/dockervg-dockerlv/ {print $5}' | awk -F% '{print $1}') > 70" | bc

Docker Metadata Storage

Check that docker’s metadata storage volume is not full

devicemapper

echo $(echo \"$(docker info 2>/dev/null | awk '/Metadata Space Available/ {print $4}') / $(docker info 2>/dev/null | awk '/Metadata Space Total/ {print $4}')\" | bc -l) '>' 0.3 | bc -l

overlay2

N/A with overlay2

Nodes & Masters

Check Name

Description

Relevant Hosts

OCP Version

Sample Alerting Logic

Etcd Service

Check that etcd is active

Masters

<= 3.9

systemctl is-active etcd

>= 3.10

oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "master-etcd" | grep -i "running" | if [ $( wc -l) -eq $(oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name | grep etcd | wc -l) ]; then exit 0; else exit 1; fi

Etcd Storage

Check that the etcd volume is not too full.This checks assumes the node storage (/var/lib/etcd) is provisioned with a separate logical volume.

Masters

<= 3.9

echo "$(lvs | awk '/etcd/ {print $4}') > 70" | bcor echo "$(df -h | awk '/etcd/ {print $5}' | awk -F% '{print $1}') > 70" | bc

>= 3.10

echo "$(lvs | awk '/etcd/ {print $4}') > 70" | bcor echo "$(df -h | awk '/etcd/ {print $5}' | awk -F% '{print $1}') > 70" | bc

Master API Service (single master)

Check that the Master API Service or pods are active

Masters

<= 3.9

systemctl is-active atomic-openshift-master

>= 3.10

Same as multi-master check.

Master API Service (multi-master)

Check that the Master API Service or pods are active

Masters

<= 3.9

systemctl is-active atomic-openshift-master-api

>= 3.10

oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "master-api" | grep -i "running" | if [ $( wc -l) -eq $(oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name | grep etcd | wc -l) ]; then exit 0; else exit 1; fi

Master Controllers Service (multi-master)

Check that the Master Controllers Service or pods are active

Masters

<= 3.9

systemctl is-active atomic-openshift-master-controllers

>= 3.10

oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "master-controller" | grep -i "running" | if [ $( wc -l) -eq $(oc get pods -n kube-system --no-headers -o=custom-columns=POD:.metadata.name | grep etcd | wc -l) ]; then exit 0; else exit 1; fi

Node Service

Check that the node service is active

All Nodes

<= 3.9

systemctl is-active atomic-openshift-node

>= 3.10

systemctl is-active atomic-openshift-node

Node Storage

Check that the node’s local data storage volume is not too full. This checks assumes the node storage (/var/lib/origin) is provisioned with a separate logical volume.

All Nodes

<= 3.9

echo "$(lvs | awk '/origin/ {print $4}') > 70" | bcor echo "$(df -h | awk '/origin/ {print $5}' | awk -F% '{print $1}') > 70" | bc

>= 3.10

echo "$(lvs | awk '/origin/ {print $4}') > 70" | bcor echo "$(df -h | awk '/origin/ {print $5}' | awk -F% '{print $1}') > 70" | bc

OpenVSwitch Service

Check that the openvswitch service or pods are active

All Nodes

<= 3.9

systemctl is-active openvswitch

>= 3.10

oc get pods -n openshift-sdn --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "ovs-" | grep -i "running" | if [ $( wc -l) -eq $(oc get nodes --no-headers | wc -l) ]; then exit 0; else exit 1; fi

SDN Service

Check that all the SDN pods are active

All Nodes

<= 3.9

NA

>= 3.10

oc get pods -n openshift-sdn --no-headers -o=custom-columns=POD:.metadata.name,STATUS:.status.phase | grep -i "sdn-" | grep -i "running" | if [ $( wc -l) -eq $(oc get nodes --no-headers | wc -l) ]; then exit 0; else exit 1; fi

API Endpoints

许多Openshift组件暴露HTTP端点,用于健康与相关操作。这些需要监控:

Check Name

Description

Sample Alerting Logic

OpenShift Master API Server

Check the health of a master API Endpoint

curl -s https://console.c1-ocp.myorg.com:8443/healthz | grep ok

Router

Check the health of the Router

curl http://router.default.svc.cluster.local:1936/healthz | grep 200

Registry

Check the health of the Registry

curl -I https://docker-registry.default.svc.cluster.local:5000/healthz | grep 200

Logging

Check the health of the EFK Logging Stack

Because of the various components and complexities involved, we recommend the OpenShift Logging health check script.

Metrics

Check the health of the Metrics Stack

Because of the various components and complexities involved, we recommend the OpenShift Metrics health check script.



今天先到这儿,希望对云原生,技术领导力, 企业管理,系统架构设计与评估,团队管理, 项目管理, 产品管管,团队建设 有参考作用 , 您可能感兴趣的文章:
领导人怎样带领好团队
构建创业公司突击小团队
国际化环境下系统架构演化
微服务架构设计
视频直播平台的系统架构演化
微服务与Docker介绍
Docker与CI持续集成/CD
互联网电商购物车架构演变案例
互联网业务场景下消息队列架构
互联网高效研发团队管理演进之一
消息系统架构设计演进
互联网电商搜索架构演化之一
企业信息化与软件工程的迷思
企业项目化管理介绍
软件项目成功之要素
人际沟通风格介绍一
精益IT组织与分享式领导
学习型组织与企业
企业创新文化与等级观念
组织目标与个人目标
初创公司人才招聘与管理
人才公司环境与企业文化
企业文化、团队文化与知识共享
高效能的团队建设
项目管理沟通计划
构建高效的研发与自动化运维
某大型电商云平台实践
互联网数据库架构设计思路
IT基础架构规划方案一(网络系统规划)
餐饮行业解决方案之客户分析流程
餐饮行业解决方案之采购战略制定与实施流程
餐饮行业解决方案之业务设计流程
供应链需求调研CheckList
企业应用之性能实时度量系统演变

如有想了解更多软件设计与架构, 系统IT,企业信息化, 团队管理 资讯,请关注我的微信订阅号:

MegadotnetMicroMsg_thumb1_thumb1_thu[2]

作者:Petter Liu
出处:http://www.cnblogs.com/wintersun/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。 该文章也同时发布在我的独立博客中-Petter Liu Blog。

posted on 2020-03-21 17:19  PetterLiu  阅读(677)  评论(0编辑  收藏  举报