自定义HPA
自定义HPA
什么是自定义hpa
在日常使用中,一般使用CPU或内存指标作为hpa扩缩容的依据,但有些时候仅仅只参考CPU或内存还是无法满足业务场景的,比如基于业务单副本QPS大小来进行自动扩缩容。
所以衍生出自定义HPA。HPA又分为v1和v2两种ApiService类型,v1一般针对cpu、内存扩缩容,v2针对自定义hpa进行扩缩容。但针对v2这种ApiService并不是原生就能支持的,
需要安装一个特殊的工具,如PrometheusAdapter。PrometheusAdapter的作用:
prometheus采集到的metrics并不能直接给k8s使用,因为两者数据格式不兼容,这时候就需要另外一个组件(Prometheus Adapter),
将Prometheus的metrics数据格式转换为K8s的API接口能识别的格式,因为Prometheus-Adapter 是自定义API service ,所以还需要Kubernetes aggregator在主API服务器中注册,
以便通过/api/ 来访问本文档主要围绕自定义HPA进行展开说明。
一般使用自定义HPA又分为两种场景。
- 业务暴露指标,使用prometheus进行抓取后,通过Adapter将prometheus指标转换为k8s可识别的格式,使用HPA中的pod类型绑定需要扩缩容的Deployment
资源对象与相关metrics做自定义自动扩缩容- 指标暴露并不是需要被扩容的pod提供出来的,如根据节点的TCP连接数扩容pod,这种并不能使用HPA中的pod类型,需要使用external类型,
将暴露的指标转换为Kubernetes apiservice后进行hpa指标与资源对象的绑定,进而实现hpa自定义扩缩容。adapter的转换逻辑:
将prometheus的指标转换为k8s可识别方式,同时在adapter configmap文件中,一般需要提供名称空间,以及pod标签与k8s中的资源相绑定,
adapter以暴露api的方式将用于hpa自定义指标的扩缩容,hpa绑定的deployment将传递adapter中定义的标签与之匹配值,传递给adapter注册至apiservice中。
使用api 方式检验转换结果是否正常:# 查询adapter中的external类型 kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/"|jq . # 查询adapter中的resource类型 kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/"|jq . # 查询adapter中的resource类型,并且指定adapter标签中对应的标签与值,这里查询名称空间为monitor,pods为所有,查询暴露指标为start_time_seconds,
注意:在monitor这个名称空间下一定需要存在指标暴露的pod,如果没有是查询不到结果。
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds" | jq .
一般指标暴露方并不是被扩容方的时候,没法直接将标签传递给adapter,
hpa的分类
# kubernetes apiserver提供了三种API用于监控相关的操作
resource metrics Api: 被设计用来给k8s核心组件提供监控指标,如kubectl top,需要安装组件如metrics-server
custom metrics Api: 被设计用来给HPA控制器提供指标
external metrics Api: 被设计用来通过外部指标扩容
# Prometheus-adpater支持以下三种API
- kubectl top node/pod 是resource metrics 指标。所以我们可以用prometheus-adapter 代替metrics-server
1. resource metrics API
2. custom metrics API
3. external metrics API
API Aggregation
在 Kubernetes 1.7 版本引入了聚合层,允许第三方应用程序通过将自己注册到kube-apiserver上,仍然通过API server的HTTP URL
对新的api进行操作和访问,为了实现这个机制,Kubernetes 在 kube-apiserver 服务中引入了一个API聚合层(API Aggregation Layer),
用于将扩展 API 的访问请求转发到用户服务的功能。
当你访问 apis/metrics.k8s.io/v1beta1,实际上访问到kube-aggregator的代理,kube-apiserver正式这个代理的后端;而metrics server
则是另一个后端。通过这种方式,就能很方便的扩展kubernetest api了。
1> 资源指标工作流程:hpa -> apiserver -> kube aggregation -> metrics-server -> kubelet(cadvisor)
2> 自定义资源指标工作流:hpa -> apiserver -> kube aggregation -> prometheus-adapter -> prometheus -> pods
不同类型的hpa如何使用
HPA 通常会根据 type 从 aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, external.metrics.k8s.io)的资源路径上拉取 metrics
一、根据CPU/MEM
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: app1
labels:
app: test-app1
spec:
minReplicas: 1
maxReplicas: 3
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-app1
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 70
resource:
name: memory
targetAverageUtilization: 70
二、根据自定义指标
0.创建custom api,与adapter绑定
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: prometheus-adapter
namespace: monitoring
port: 443
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 998
versionPriority: 10
1.adapter指标转换
前提为prometheus已经采集到指标
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |
# 使用rules将prometheus采集后转换为k8s可识别指标
rules:
# 查询prometheus指标开头为container,并且标签中container不是POD,以及namespacce,pod不为空
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
seriesFilters: []
resources:
overrides:
# prometheus中的namespace标签对应k8s资源中的namespace以及pod,注意该label必须是一个真实的k8s的resource,
如k8s的pod名称将映射为pod resourcce,所以metrics中必须存在一个真实的resource名称,将其映射为k8s resourcce
namespace:
resource: namespace
pod:
resource: pod
name:
# 将查询出来的指标as改为别名,hpa可以与别名相绑定
matches: ^container_(.*)_seconds_total$
as: ""
# 处理调用custom metrics api获取到的metrics的value,该值最终提供hpa进行扩缩容
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
seriesFilters:
- isNot: ^container_.*_seconds_total$
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: ^container_(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[1m])) by (<<.GroupBy>>)
# adapter中暴露cpu以及内存指标
"resourceRules":
"cpu":
"containerLabel": "container"
"containerQuery": |
sum by (<<.GroupBy>>) (
irate (
container_cpu_usage_seconds_total{<<.LabelMatchers>>,pod!=""}[120s]
)
)
"memory":
"containerLabel": "container"
"containerQuery": |
sum by (<<.GroupBy>>) (
container_memory_working_set_bytes{<<.LabelMatchers>>,pod!=""}
)
"nodeQuery": |
sum by (<<.GroupBy>>) (
node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>}
-
node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>}
)
or sum by (<<.GroupBy>>) (
windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>}
-
windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>}
)
"resources":
"overrides":
"node":
"resource": "node"
"namespace":
"resource": "namespace"
"pod":
"resource": "pod"
"window": "5m"
2.hpa扩容
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: ingress-hpa-test
namespace: monitoring
spec:
minReplicas: 1
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-ingress-monitoring-ingress-nginx-controller
metrics:
- type: Pods
pods:
metric:
name: container_cpu_system_seconds_total
target:
averageValue: '10'
type: AverageValue
三、根据external指标扩缩容
如节点的信息使用node-exporter进行收集,这种指标实际与被扩容的业务pod并无关系,所以需要使用external类型将指标注册到api-resources。
如可以支持nginx的值扩容mysql。节点的tcp连接数扩容nginx-ingress-controller
0.创建external api,用于external类型hpa获取指标
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.external.metrics.k8s.io
spec:
service:
name: prometheus-adapter
namespace: monitoring
group: external.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
1.adapter指标转换
通过adapter暴露指标node_sockstat_TCP_alloc
apiVersion: v1
data:
config.yaml: |-
externalRules:
# 获取prometheus指标,指标为ingress="node-hpa"
- seriesQuery: '{__name__="node_sockstat_TCP_alloc",ingress="node-hpa"}'
# 指标查询语句
metricsQuery: node_sockstat_TCP_alloc{ingress="node-hpa",instance=~"10.+"}
resources:
overrides:
# 指定名称空间
namespace: { resource: "namespace" }
name:
matches: "node_sockstat_TCP_alloc"
as: "node_sockstat_tcp_alloc"
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: metrics-adapter
app.kubernetes.io/name: prometheus-adapter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.8.4
name: adapter-config
namespace: monitoring
2.hpa自动扩缩容
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: ingress-hpa-test
namespace: monitoring
spec:
minReplicas: 1
maxReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-ingress-monitoring-ingress-nginx-controller
metrics:
- type: External
external:
metric:
name: node_sockstat_tcp_alloc # 指定adapter的指标名
selector:
matchLabels:
job: "node-exporter-hpa"
ingress: "node-hpa"
target:
type: AverageValue
averageValue: 60
3.验证adapter的指标
root@management:/opt/kubernetes/prometheus/k8s-dev/manifests/configuration-files# kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/monitoring/node_sockstat_tcp_alloc"|jq .
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metricName": "node_sockstat_tcp_alloc",
"metricLabels": {
"__name__": "node_sockstat_TCP_alloc",
"beta_kubernetes_io_arch": "amd64",
"beta_kubernetes_io_instance_type": "S4.LARGE8",
"beta_kubernetes_io_os": "linux",
"cloud_tencent_com_auto_scaling_group_id": "asg-9j80v1i5",
"cloud_tencent_com_node_instance_id": "ins-7dznkner",
"failure_domain_beta_kubernetes_io_region": "sh",
"failure_domain_beta_kubernetes_io_zone": "200004",
"ingress": "node-hpa",
"instance": "10.4.64.15",
"job": "node-exporter",
"kubernetes_io_arch": "amd64",
"kubernetes_io_hostname": "10.4.64.15",
"kubernetes_io_os": "linux",
"node": "10.4.64.15",
"node_kubernetes_io_instance_type": "S4.LARGE8",
"tke_cloud_tencent_com_nodepool_id": "np-p03bj711",
"topology_kubernetes_io_region": "sh",
"topology_kubernetes_io_zone": "200004"
},
"timestamp": "2022-11-18T10:09:38Z",
"value": "96"
},
{
"metricName": "node_sockstat_tcp_alloc",
"metricLabels": {
"__name__": "node_sockstat_TCP_alloc",
"beta_kubernetes_io_arch": "amd64",
"beta_kubernetes_io_instance_type": "S4.LARGE8",
"beta_kubernetes_io_os": "linux",
"cloud_tencent_com_auto_scaling_group_id": "asg-9j80v1i5",
"cloud_tencent_com_node_instance_id": "ins-eqd7lizl",
"failure_domain_beta_kubernetes_io_region": "sh",
"failure_domain_beta_kubernetes_io_zone": "200005",
"ingress": "node-hpa",
"instance": "10.4.80.19",
"job": "node-exporter",
"kubernetes_io_arch": "amd64",
"kubernetes_io_hostname": "10.4.80.19",
"kubernetes_io_os": "linux",
"node": "10.4.80.19",
"node_kubernetes_io_instance_type": "S4.LARGE8",
"tke_cloud_tencent_com_nodepool_id": "np-p03bj711",
"topology_kubernetes_io_region": "sh",
"topology_kubernetes_io_zone": "200005"
},
"timestamp": "2022-11-18T10:09:38Z",
"value": "71"
}
]
}
坑点
- 在使用external类型hpa时,获取自定义metrics,发现hpa取值异常,原本定位以为adapter异常,但是看api获取值却不会出现异常情况,
最终定位为hpa值显示问题。(84500m实际就是84.5,因为小数问题,显示和其它数值不一样,同样这也是一种计数方式)root@management:~# k8s-v6 get hpa -n monitoring
ingress-hpa-test Deployment/test-ingress-monitoring-ingress-nginx-controller 84500m/60 (avg) 1 2 2 22h
ingress-hpa-test Deployment/test-ingress-monitoring-ingress-nginx-controller 83/60 (avg) 1 2 2 22h
- 原本adapter一直在尝试rule的方式,并没有看过相关external的文章,导致踩坑不少,不过却学习到了一种标签聚合的方式,在两个指标中找到共同点,
可以取到指标中有用的标签赋予给其他指标所用。# on中填写共同指标 # group_left 中取右边的指标标签赋予给左边的指标 kube_pod_info{} * on(pod) group_left(app,component) go_memstats_stack_sys_bytes{app!='',pod!='',component!=''}