k8s监控系统-prometheus
一、方案:prometheus + altermanager + grafana + 各种收集组件
二、部署
参考:https://github.com/prometheus-operator/kube-prometheus
1、拉取项目
说明:该项目对应k8s版本,拉取符合k8s版本的项目
git clone -b v0.10.0 https://github.com/prometheus-operator/kube-prometheus.git
2、部署项目
说明:在部署前,建议修改manifests目录下alertmanager-alertmanager.yaml、prometheus-prometheus.yaml副本数为1,资源充足的话忽略。
cd kube-prometheus
kubectl apply -f manifests/setup/
kubectl apply -f manifests/
3、创建prometheus、grafana ingress,部署ingress-nginx请参考ingress文章
prometheus-ingress.yaml
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: prometheus-ingress namespace: monitoring spec: ingressClassName: nginx rules: - host: "test.prometheus.com" http: paths: - path: / pathType: Prefix backend: service: name: prometheus-k8s port: number: 9090
grafana-ingress.yaml
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: grafana-ingress namespace: monitoring spec: ingressClassName: nginx rules: - host: "test.grafana.com" http: paths: - path: / pathType: Prefix backend: service: name: grafana port: number: 3000
4、添加本机hosts,用于浏览器访问
192.168.152.10 test.grafana.com 192.168.152.10 test.prometheus.com
三、浏览器访问http://test.grafana.com
说明:controller-manager、scheduler、etcd组件面板没有监控数据显示,这是因为使用了kubeadm方式部署k8s。
四、解决controller-manager、scheduler、etcd组件面板没有监控数据问题
参考:https://cloud.tencent.com/developer/article/1807805
1、解决controller-manager组件面板没有监控数据问题
修改/etc/kubernetes/manifests/kube-controller-manager.yaml中bind-address=127.0.0.1更改为bind-address=0.0.0.0,修改完后等待组件自动重启。
创建controller-manager组件的Service和Endpoints
kube-controller-manager-svc-ep.yml
apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-controller-manager labels: app.kubernetes.io/name: kube-controller-manager spec: selector: component: kube-controller-manager type: ClusterIP clusterIP: None ports: - name: https-metrics port: 10257 targetPort: 10257 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: kube-controller-manager name: kube-controller-manager namespace: kube-system subsets: - addresses: # 根据实际情况添加修改ip - ip: 192.168.152.10 ports: - name: https-metrics port: 10257 protocol: TCP
kubectl apply -f kube-controller-manager-svc-ep.yml
2、解决scheduler组件面板没有监控数据问题
修改/etc/kubernetes/manifests/kube-scheduler.yaml中bind-address=127.0.0.1更改为bind-address=0.0.0.0,修改完后等待组件自动重启。
创建scheduler组件的Service和Endpoints
kube-scheduler-svc-ep.yml
apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-scheduler labels: app.kubernetes.io/name: kube-scheduler spec: selector: component: kube-scheduler type: ClusterIP clusterIP: None ports: - name: https-metrics port: 10259 targetPort: 10259 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: kube-scheduler name: kube-scheduler namespace: kube-system subsets: - addresses:
# 根据实际情况添加修改ip - ip: 192.168.152.10 ports: - name: https-metrics port: 10259 protocol: TCP
kubectl apply -f kube-scheduler-svc-ep.yml
3、解决etcd组件面板没有监控数据问题
创建etcd组件的Service和Endpoints
kube-etcd-svc-ep.yml
apiVersion: v1 kind: Service metadata: name: etcd-k8s namespace: kube-system labels: k8s-app: etcd spec: type: ClusterIP clusterIP: None ports: - name: etcd port: 2379 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: etcd name: etcd-k8s namespace: kube-system subsets: - addresses: # 根据实际情况添加修改ip - ip: 192.168.152.10 ports: - name: etcd port: 2379 protocol: TCP
kubectl apply -f kube-etcd-svc-ep.yml
创建连接etcd的证书
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
prometheus挂载证书:编辑prometheus,配置secrets。
kubectl edit prometheus k8s -n monitoring
创建etcd的ServiceMonitor
kube-etcd-sm.yml
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s namespace: monitoring labels: k8s-app: etcd spec: jobLabel: k8s-app endpoints: - port: etcd interval: 30s scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.crt certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key insecureSkipVerify: true selector: matchLabels: k8s-app: etcd namespaceSelector: matchNames: - kube-system
kubectl apply -f kube-etcd-sm.yml
4、验证
浏览器访问test.grafana.com
controller-manager组件面板
scheduler组件面板
etcd组件面板,默认面板中没有etcd的,可以从https://grafana.com/grafana/dashboards/找一个导入即可。