Prometheus监控神器-Kubernetes篇(一)
在Kubernetes中手动部署Statefulset类型的Prometheus、Alertmanager集群,并使用StorageClass来持久化数据。
本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求进行选择,后面会详细讲解针对数据持久化的具体细节。
部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。
创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。
如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos
自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。
环境
我的本地环境使用的 sealos
一键部署,主要是为了便于测试。
OS | Kubernetes | HostName | IP | Service |
---|---|---|---|---|
Ubuntu 18.04 |
1.17.7 |
sealos-k8s-m1 |
192.168.1.151 |
node-exporter prometheus-federate-0 |
Ubuntu 18.04 |
1.17.7 |
sealos-k8s-m2 |
192.168.1.152 |
node-exporter grafana alertmanager-0 |
Ubuntu 18.04 |
1.17.7 |
sealos-k8s-m3 |
192.168.1.150 |
node-exporter alertmanager-1 |
Ubuntu 18.04 |
1.17.7 |
sealos-k8s-node1 |
192.168.1.153 |
node-exporter prometheus-0 kube-state-metrics |
Ubuntu 18.04 |
1.17.7 |
sealos-k8s-node2 |
192.168.1.154 |
node-exporter prometheus-1 |
Ubuntu 18.04 |
1.17.7 |
sealos-k8s-node2 |
192.168.1.155 |
node-exporter prometheus-2 |
# 给master跟node加标签
# prometheus
kubectl label node sealos-k8s-node1 k8s-app=prometheus
kubectl label node sealos-k8s-node2 k8s-app=prometheus
kubectl label node sealos-k8s-node3 k8s-app=prometheus
# federate
kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
# alertmanager
kubectl label node sealos-k8s-m2 k8s-app=alertmanager
kubectl label node sealos-k8s-m3 k8s-app=alertmanager
#创建对应的部署目录
mkdir /data/manual-deploy/ && cd /data/manual-deploy/
mkdir alertmanager grafana ingress-nginx kube-state-metrics node-exporter prometheus
部署 Prometheus
创建Prometheus的storageclass配置文件
cat prometheus-data-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prometheus-lpv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
创建Prometheus的sc的pv配置文件,同时指定了调度节点。
# 在需要调度的Prometheus的node上创建目录与赋权
mkdir /data/prometheus
chown -R 65534:65534 /data/prometheus
cat prometheus-federate-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-lpv-0
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-lpv
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-node1
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-lpv-1
spec:
capacity:
storage: 20Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-lpv
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-node2
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-lpv-2
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-lpv
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-node3
创建Prometheus的RBAC文件。
cat prometheus-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1 # api的version
kind: ClusterRole # 类型
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: # 资源
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus # 自定义名字
namespace: kube-system # 命名空间
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef: # 选择需要绑定的Role
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects: # 对象
- kind: ServiceAccount
name: prometheus
namespace: kube-system
创建Prometheus的configmap配置文件。
cat prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
data:
prometheus.yml: |
global:
scrape_interval: 30s
evaluation_interval: 30s
external_labels:
cluster: "01"
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
metric_relabel_configs:
- action: replace
source_labels: [id]
regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
target_label: rkt_container_name
replacement: '${2}-${1}'
- action: replace
source_labels: [id]
regex: '^/system\.slice/(.+)\.service$'
target_label: systemd_service_name
replacement: '${1}'
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__address__]
action: replace
target_label: instance
regex: (.+):(.+)
replacement: $1
创建Prometheus的Statefulset配置文件。
cat prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
spec:
serviceName: "prometheus"
podManagementPolicy: "Parallel"
replicas: 3
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- prometheus
topologyKey: "kubernetes.io/hostname"
priorityClassName: system-cluster-critical
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.4.0"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
- image: prom/prometheus:v2.20.0
imagePullPolicy: IfNotPresent
name: prometheus
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention=24h"
- "--web.console.libraries=/etc/prometheus/console_libraries"
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.enable-lifecycle"
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: "/prometheus"
name: prometheus-data
- mountPath: "/etc/prometheus"
name: config-volume
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 1000m
memory: 2500Mi
securityContext:
runAsUser: 65534
privileged: true
serviceAccountName: prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "prometheus-lpv"
resources:
requests:
storage: 5Gi
创建Prometheus的svc配置文件
cat prometheus-service-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: kube-system
spec:
ports:
- name: prometheus
port: 9090
targetPort: 9090
selector:
k8s-app: prometheus
clusterIP: None
部署创建好的Prometheus的相关资源文件
cd /data/manual-deploy/prometheus
ls
prometheus-configmap.yaml # Configmap
prometheus-data-pv.yaml # PVC
prometheus-data-storageclass.yaml # SC
prometheus-rbac.yaml # RBAC
prometheus-service-statefulset.yaml # SVC
prometheus-statefulset.yaml # Statefulset
# 部署应用
kubectl apply -f .
验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
prometheus-lpv-0 10Gi RWO Retain Available prometheus-lpv 6m28s
prometheus-lpv-1 10Gi RWO Retain Available prometheus-lpv 6m28s
prometheus-lpv-2 10Gi RWO Retain Available prometheus-lpv 6m28s
kubectl -n kube-system get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-data-prometheus-0 Bound prometheus-lpv-0 10Gi RWO prometheus-lpv 2m16s
prometheus-data-prometheus-1 Bound prometheus-lpv-2 10Gi RWO prometheus-lpv 2m16s
prometheus-data-prometheus-2 Bound prometheus-lpv-1 10Gi RWO prometheus-lpv 2m16s
kubectl -n kube-system get pod prometheus-{0..2}
NAME READY STATUS RESTARTS AGE
prometheus-0 2/2 Running 0 3m16s
prometheus-1 2/2 Running 0 3m16s
prometheus-2 2/2 Running 0 3m16s
部署 Node Exporter
创建Demonset的node-exporter文件
cd /data/manual-deploy/node-exporter/
cat node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
containers:
- image: quay.io/prometheus/node-exporter:v1.0.0
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
protocol: TCP
name: metrics
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/sys
name: sys
- mountPath: /host
name: rootfs
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostNetwork: true
hostPID: true
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
k8s-app: node-exporter
name: node-exporter
namespace: kube-system
spec:
ports:
- name: http
port: 9100
protocol: TCP
selector:
k8s-app: node-exporter
部署
cd /data/manual-deploy/node-exporter/
kubectl apply -f node-exporter.yaml
验证状态
kubectl -n kube-system get pod |grep node-exporter
node-exporter-45s2q 2/2 Running 0 6h43m
node-exporter-f4rrw 2/2 Running 0 6h43m
node-exporter-hvtzj 2/2 Running 0 6h43m
node-exporter-nlvfq 2/2 Running 0 6h43m
node-exporter-qbd2q 2/2 Running 0 6h43m
node-exporter-zjrh4 2/2 Running 0 6h43m
部署 kube-state-metrics
kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。
因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。
kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest
。
cd /data/manual-deploy/kube-state-metrics/
cat kube-state-metrics-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: kube-system
name: kube-state-metrics-resizer
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get"]
- apiGroups: ["apps"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
- apiGroups: ["extensions"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
- apiGroups: ["policy"]
resources:
- poddisruptionbudgets
verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
resources:
- certificatesigningrequests
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
创建kube-state-metrics的deployment文件
cat kube-state-metrics-deloyment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.6.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: k8s.gcr.io/addon-resizer:1.8.4
resources:
limits:
cpu: 150m
memory: 50Mi
requests:
cpu: 150m
memory: 50Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
---
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
k8s-app: kube-state-metrics
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
- name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
k8s-app: kube-state-metrics
部署
kubectl apply -f kube-state-metrics-rbac.yaml
kubectl apply -f kube-state-metrics-deloyment.yaml
验证
kubectl -n kube-system get pod |grep kube-state-metrics
kube-state-metrics-657d8d6669-bqbs8 2/2 Running 0 4h
kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现
kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。
部署 Alertmanager 集群
创建目录、赋权
k8s-m2
mkdir /data/alertmanager
chown -R 65534:65534 /data/alertmanager
k8s-m3
mkdir /data/alertmanager
chown -R 65534:65534 /data/alertmanager
cd /data/manual-deploy/alertmanager/
cat alertmanager-data-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: alertmanager-lpv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
创建Alertmanager的pv配置文件
cat alertmanager-data-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv-0
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: alertmanager-lpv
local:
path: /data/alertmanager
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-m2
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv-1
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: alertmanager-lpv
local:
path: /data/alertmanager
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- sealos-k8s-m3
创建Alertmanager的configmap配置文件
cat alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: 'yo@qq.com'
smtp_auth_username: '345@qq.com'
smtp_auth_password: 'bhgb'
smtp_hello: '警报邮件'
smtp_require_tls: false
route:
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 30s
repeat_interval: 12h
receiver: default
routes:
- receiver: email
group_wait: 10s
match:
team: ops
receivers:
- name: 'default'
email_configs:
- to: '9935226@qq.com'
send_resolved: true
- name: 'email'
email_configs:
- to: '9935226@qq.com'
send_resolved: true
创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。
cat alertmanager-statefulset-cluster.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: alertmanager
namespace: kube-system
labels:
k8s-app: alertmanager
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.21.0
spec:
serviceName: "alertmanager-operated"
replicas: 2
selector:
matchLabels:
k8s-app: alertmanager
version: v0.21.0
template:
metadata:
labels:
k8s-app: alertmanager
version: v0.21.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- effect: NoSchedule
key: node-role.kubernetes.io/master
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- alertmanager
topologyKey: "kubernetes.io/hostname"
containers:
- name: prometheus-alertmanager
image: "prom/alertmanager:v0.21.0"
imagePullPolicy: "IfNotPresent"
args:
- "--config.file=/etc/config/alertmanager.yml"
- "--storage.path=/data"
- "--cluster.listen-address=${POD_IP}:9094"
- "--web.listen-address=:9093"
- "--cluster.peer=alertmanager-0.alertmanager-operated:9094"
- "--cluster.peer=alertmanager-1.alertmanager-operated:9094"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: 9093
name: web
protocol: TCP
- containerPort: 9094
name: mesh-tcp
protocol: TCP
- containerPort: 9094
name: mesh-udp
protocol: UDP
readinessProbe:
httpGet:
path: /#/status
port: 9093
initialDelaySeconds: 30
timeoutSeconds: 60
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: storage-volume
mountPath: "/data"
subPath: ""
resources:
limits:
cpu: 1000m
memory: 500Mi
requests:
cpu: 10m
memory: 50Mi
- name: prometheus-alertmanager-configmap-reload
image: "jimmidyson/configmap-reload:v0.4.0"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9093/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
securityContext:
runAsUser: 0
privileged: true
volumes:
- name: config-volume
configMap:
name: alertmanager-config
volumeClaimTemplates:
- metadata:
name: storage-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "alertmanager-lpv"
resources:
requests:
storage: 5Gi
创建Alertmanager的operated-service配置文件
cat alertmanager-operated-service.yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager-operated
namespace: kube-system
labels:
app.kubernetes.io/name: alertmanager-operated
app.kubernetes.io/component: alertmanager
spec:
type: ClusterIP
clusterIP: None
sessionAffinity: None
selector:
k8s-app: alertmanager
ports:
- name: web
port: 9093
protocol: TCP
targetPort: web
- name: tcp-mesh
port: 9094
protocol: TCP
targetPort: tcp-mesh
- name: udp-mesh
port: 9094
protocol: UDP
targetPort: udp-mesh
部署
cd /data/manual-deploy/alertmanager/
ls
alertmanager-configmap.yaml
alertmanager-data-pv.yaml
alertmanager-data-storageclass.yaml
alertmanager-operated-service.yaml
alertmanager-service-statefulset.yaml
alertmanager-statefulset-cluster.yaml
kubectl apply -f .
OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。