第六课:部署集群监控系统
15. 部署监控系统
node-export,alertmanager,grafana,kube-state-metrics,promeheus.
组件说明
MettricServer: 是kubernetes集群资源使用情况的聚合器,手机数据给kubernetes集群内使用,如kubectl,hpa,scheduler等。
NodeExporter: 用于个node的关键度量指标状态数据。
KubeStateMetrics: 收集kubernetes集群内资源对象数据,指定告警规则。
Prometheus-adapter: 自定义监控指标与容器指标
Prometheus: 采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。
Grafana:可视化数据展示和监控平台。
Alertmanager: 实现短信或邮件告警。
15.1 安装NFS服务端
此处安装nfs用于在试验环境存储数据使用,生产环境依情况而定。
15.1.1 master节点安装nfs
yum -y install nfs-utils
15.1.2 创建nfs目录
mkdir -p /ifs/kubernetes
15.1.3 修改权限
chmod -R 777 /ifs/kubernetes
15.1.4 编辑export文件
vim /etc/exports
/ifs/kubernetes *(rw,no_root_squash,sync)
15.1.5 修改配置启动文件
cat >/usr/lib/systemd/system/rpcbind.socket<<EOF
[Unit]
Description=RPCbind Server Activation Socket
[Socket]
ListenStream=/var/run/rpcbind.sock
ListenStream=0.0.0.0:111
ListenDatagram=0.0.0.0:111
[Install]
WantedBy=socket.target
EOF
15.1.6 配置生效
exportfs -f
15.1.7 启动服务
systemctl start rpcbind
systemctl status rpcbind
systemctl enable rpcbind
systemctl start nfs
systemctl status nfs
15.1.8 showmount测试(master01)
showmount -e 192.168.68.146
[root@master01 sockets.target.wants]# showmount -e 192.168.68.146
Export list for 192.168.68.146:
/ifs/kubernetes *
15.1.9 在所有的node节点安装客户端
yum -y install nfs-utils
15.1.10 在node节点检查 (node01,node02)
所有的node节点都要显示服务端信息,才能挂载成功
[root@node01 cfg]# showmount -e 192.168.68.146
Export list for 192.168.68.146:
/ifs/kubernetes *
15.2 部署PVC
nfs服务端地址需要修改:192.168.68.146
kubectl apply -f nfs-class.yaml
#修改nfs-deployment.yaml中的NFS ip地址
kubectl apply -f nfs-deployment.yaml
kubectl apply -f nfs-rabc.yaml
查看nfs pod状态
[root@master01 nfs]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-7d4864f8f-wbbsq 1/1 Running 0 12s
15.2.1 查看是否部署成功
kubectl get StorageClass
[root@master01 nfs]# kubectl get StorageClass
NAME PROVISIONER AGE
managed-nfs-storage fuseim.pri/ifs 2m14s
15.2.2 登陆页面查看
从k8s dashboard页面的StorageClass页面可以看到我们刚部署完的nfs服务
15.3 部署监控系统
注意需要修改的配置文件
修改IP:
ServiceMonitor/prometheus-EtcdService.yaml
ServiceMonitor/prometheus-kubeControllerManagerService.yaml
ServiceMonitor/prometheus-kubeSchedulerService.yaml
ServiceMonitor/prometheus-KubeProxyService.yaml
[root@master01 serviceMonitor]# ls | xargs grep 68
prometheus-EtcdService.yaml: - ip: 192.168.68.146
prometheus-EtcdService.yaml: - ip: 192.168.68.147
prometheus-EtcdService.yaml: - ip: 192.168.68.148
prometheus-kubeControllerManagerService.yaml: - ip: 192.168.68.146
prometheus-kubeControllerManagerService.yaml: - ip: 192.168.68.147
prometheus-kubeControllerManagerService.yaml: - ip: 192.168.68.148
prometheus-KubeProxyService.yaml: - ip: 192.168.68.149
prometheus-KubeProxyService.yaml: - ip: 192.168.68.151
prometheus-kubeSchedulerService.yaml: - ip: 192.168.68.146
prometheus-kubeSchedulerService.yaml: - ip: 192.168.68.147
prometheus-kubeSchedulerService.yaml: - ip: 192.168.68.148
配置
kubectl apply -f setup/创建monitoring命名空间,部署operator服务
kubectl apply -f alertmanager/部署alertmanager服务
kubectl apply -f node-exporter/部署node-exporter服务
kubectl apply -f kube-state-metrics/
kubectl apply -f grafana/
kubectl apply -f prometheus/
kubectl apply -f serviceMonitor/
操作过程:
cd /root/monitor/prometheus
kubectl apply -f setup/
[root@master01 prometheus]# kubectl apply -f setup/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
kubectl apply -f alertmanager/
#修改alertmanager/alertmanager-alertmanager.yaml将副本数改为2 replicas: 2
kubectl apply -f alertmanager/
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
cd no-exporter
ls | xargs grep image
node-exporter-daemonset.yaml: image: prom/node-exporter:v0.18.1
node-exporter-daemonset.yaml: image: quay.io/coreos/kube-rbac-proxy:v0.4.1
#可以先在node节点把需要的镜像pull到本地然后在部署服务
[root@master01 prometheus]# kubectl apply -f node-exporter/
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
cd kube-state-metrics
[root@master01 kube-state-metrics]# ls | xargs grep image
kube-state-metrics-deployment.yaml: image: quay.io/coreos/kube-rbac-proxy:v0.4.1
kube-state-metrics-deployment.yaml: image: quay.io/coreos/kube-rbac-proxy:v0.4.1
kube-state-metrics-deployment.yaml: image: quay.io/coreos/kube-state-metrics:v1.8.0
[root@master01 prometheus]# kubectl apply -f kube-state-metrics/
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics-rbac created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
#我们前面安装的nfs存储服务用在这里的grafana-pvc,prometheus-pvc服务,storageClassName: managed-nfs-storage
[root@master01 prometheus]# kubectl get storageClass
NAME PROVISIONER AGE
managed-nfs-storage fuseim.pri/ifs 3h28m
这里的name要对应起来。
[root@master01 prometheus]# kubectl apply -f grafana/
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
persistentvolumeclaim/grafana created
clusterrolebinding.rbac.authorization.k8s.io/grafana-rbac created
service/grafana created
serviceaccount/grafana created
[root@master01 prometheus]# kubectl apply -f prometheus/
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
persistentvolumeclaim/prometheus-data created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-rbac created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
[root@master01 prometheus]# kubectl apply -f serviceMonitor/
servicemonitor.monitoring.coreos.com/alertmanager created
servicemonitor.monitoring.coreos.com/grafana created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
servicemonitor.monitoring.coreos.com/node-exporter created
service/kube-etcd created
endpoints/kube-etcd created
service/kube-proxy created
endpoints/kube-proxy created
service/kube-controller-manager created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
endpoints/kube-controller-manager configured
service/kube-scheduler created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
endpoints/kube-scheduler configured
servicemonitor.monitoring.coreos.com/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-etcd created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-proxy created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
如果提示权限问题,解决方法如下:
kubectl create serviceaccount kube-state-metrics -n monitoring
kubectl create serviceaccount grafana -n monitoring
kubectl create serviceaccount prometheus-k8s -n monitoring
创建权限文件
#kube-state-metrics
[root@master01 prometheus]# cat kube-state-metrics/kube-state-metrics-rabc.yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics-rbac
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: monitoring
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
#grafana
[root@master01 prometheus]# cat grafana/grafana-rabc.yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: grafana-rbac
subjects:
- kind: ServiceAccount
name: grafana
namespace: monitoring
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
#prometheus
[root@master01 prometheus]# cat prometheus/prometheus-rabc.yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus-rbac
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
15.3.1 获取grafana pod和svc信息
获取grafana的nodeport地址和端口用于web访问
[root@master01 prometheus]# kubectl get pod,svc -A -o wide | grep grafana
monitoring pod/grafana-5dc77ff8cb-9lcgc 1/1 Running 0 14m 172.17.82.8 192.168.68.151 <none> <none>
monitoring service/grafana NodePort 10.0.0.56 <none> 3000:30093/TCP 14m app=grafana
15.3.2 登陆grafana dashboard
用户名密码:admin/admin
http://192.168.68.151:30093/
15.3.3 获取prometheus pod和svc信息
[root@master01 prometheus]# kubectl get pod,svc -A -o wide | grep prometheus
monitoring pod/prometheus-k8s-0 3/3 Running 1 29m 172.17.15.9 192.168.68.149 <none> <none>
monitoring pod/prometheus-k8s-1 3/3 Running 1 29m 172.17.82.9 192.168.68.151 <none> <none>
monitoring pod/prometheus-operator-6685db5c6-hszsn 1/1 Running 0 3h29m 172.17.82.5 192.168.68.151 <none> <none>
monitoring service/prometheus-k8s NodePort 10.0.0.65 <none> 9090:38883/TCP 29m app=prometheus,prometheus=k8s
15.3.4 登陆prometheus dashboard
关于grafana和prometheus的使用这里不详细说明,这里已经有现成的一些模板展示k8s集群的相关监控,也可以自行导入监控模板。