Prometheus operator
一、简介
地址:https://github.com/prometheus-operator/kube-prometheus
https://blog.csdn.net/choerodon/article/details/98587027
Prometheus Operator架构图:
- Operator:根据自定义资源(Custom Resource Definition / CRDs)来部署和管理Prometheus Server,同时监控这些自定义资源事件的变化来做相应的处理,是整个系统的控制中心
- Prometheus Server:Opreator根据自定义资源Prometheus类型中定义内容而部署的Prometheus Server集群,这些自定义资源可以看作是用来管理Prometheus Server集群的StatefulSets资源
- ServiceMonitor:声明指定监控的服务,描述了一组被Prometheus监控的目标列表。该资源通过Labels来获取对应的Service Endpoint,让Prometheus Server通过选取的Service 来获取 Metrics信息
- Service:简单的说就是Prometheus监控的对象
二、部署
Prometheus Operator部署很简单
# 下载 # git clone https://github.com/prometheus-operator/kube-prometheus.git # cd kube-prometheus # 安装operator # kubectl create -f manifests/setup # 安装prometheus kubectl create -f manifests/
-
可以在replicas定义启动个数
查看
# kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 10 8d blackbox-86b7486879-w6n22 1/1 Running 0 18h grafana-5cb8d5c55b-wplg4 1/1 Running 5 8d kafka-exporter-5cf8fdd8f8-c4j5t 1/1 Running 0 20h kube-state-metrics-65f69f9759-spcr6 3/3 Running 27 8d node-exporter-rdjl9 2/2 Running 2 24h prometheus-adapter-865cc8dbcd-bc7v6 1/1 Running 34 8d prometheus-k8s-0 2/2 Running 3 76m prometheus-operator-56d44459f7-vt2l9 2/2 Running 15 8d # kubectl get svc -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main ClusterIP 10.99.189.210 <none> 9093/TCP 8d alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 8d blackbox ClusterIP 10.108.47.141 <none> 9115/TCP 18h grafana ClusterIP 10.104.30.183 <none> 3000/TCP 8d kafka-exporter ClusterIP 10.98.228.115 <none> 9308/TCP 20h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 8d node-exporter ClusterIP None <none> 9100/TCP 8d prometheus-adapter ClusterIP 10.108.67.0 <none> 443/TCP 8d prometheus-k8s ClusterIP 10.96.50.138 <none> 9090/TCP 8d prometheus-operated ClusterIP None <none> 9090/TCP 16h prometheus-operator ClusterIP None <none> 8443/TCP 8d
定义ingress,用于访问alertmanager、grafana、prometheus
prom-monitor.yaml
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: prom-monitor namespace: monitoring spec: rules: - host: alert.test.com http: paths: - backend: serviceName: alertmanager-main servicePort: 9093 path: / - host: grafana.test.com http: paths: - backend: serviceName: grafana servicePort: 3000 path: / - host: prom.test.com http: paths: - backend: serviceName: prometheus-k8s servicePort: 9090 path: /
-
grafana.test.com prom.test.com alert.test.com
修改本机hosts文件
访问 grafana.test.com,其本身提供了很多dashboard
三、
# 查看servicemonitor # kubectl get servicemonitor -n monitoring NAME AGE alertmanager 7d2h coredns 7d2h grafana 7d2h kube-apiserver 7d2h kube-controller-manager 7d2h kube-scheduler 7d2h kube-state-metrics 7d2h kubelet 7d2h node-exporter 7d2h prometheus 7d2h prometheus-adapter 7d2h prometheus-operator 7d2h
查看kube-controller-manager的servicemonitor
# kubectl get servicemonitor kube-controller-manager -n monitoring -o yaml | tail -15 ... port: http-metrics scheme: http tlsConfig: insecureSkipVerify: false jobLabel: k8s-app namespaceSelector: matchNames: - kube-system selector: matchLabels: k8s-app: kube-controller-manager
- 其需要在kube-system下匹配一个含有k8s-app=kube-controller-manager的service
- 修改其scheme为http,默认为https
apiVersion: v1 kind: Endpoints metadata: name: kube-controller-manager-monitoring namespace: kube-system labels: k8s-app: kube-controller-manager subsets: - addresses: - ip: 192.168.10.240 - ip: 192.168.10.241 - ip: 192.168.10.242 ports: - name: http-metrics port: 10252 protocol: TCP
controller-service.yaml
apiVersion: v1 kind: Service metadata: name: kube-controller-manager-monitoring namespace: kube-system labels: k8s-app: kube-controller-manager spec: ports: - port: 10252 name: http-metrics protocol: TCP type: ClusterIP
创建
# kubectl create -f .
查看
# kubectl get svc,ep -n kube-system -l k8s-app=kube-controller-manager NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-controller-manager-monitoring ClusterIP 10.102.204.13 <none> 10252/TCP 44m NAME ENDPOINTS AGE endpoints/kube-controller-manager-monitoring 192.168.10.240:10252,192.168.10.241:10252,192.168.10.242:10252 44m
同时修改controller-manager的启动配置文件
/usr/lib/systemd/system/kube-controller-manager.service
# 修改地址 --address=0.0.0.0
重启controller-manager
测试
# curl 127.0.0.1:10252 404 page not found # curl 10.102.204.13:10252 404 page not found
访问本机端口和controller-manager的service端口的结果是一样的
查看prometheus