k8s HPA 配置时出现: failed to get cpu utilization: missing request for cpu
# https://github.com/kubernetes/kubernetes/issues/79365
# 在这个讨论中给出了这次故障的答案.
1. 关键点:
一个pod中若定义了2个或多个Pod,那么配置 resources资源限制时,是每个容器都要指定,若仅指定了一个,那这个resource仅对指定的容器做资源限制,没有指定的,则不做资源限制!!!
而HPA要正常工作,前提是Pod中每个容器都必须指定 resources资源限制,否则就会报错:
failed to get cpu utilization: missing request for cpu
invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: missing request for memory
failed to get memory utilization: missing request for memory
2.
aggregator 是K8s提供的针对自定义API方便注册到K8s的APIServer中去,方便开发人员将k8s的请求通过Aggregator,让请求进入自己的service并转发给自己的程序。
aggregator若没有启用,会对metrics-server的产生影响,具体还没太明白:
https://blog.csdn.net/fly910905/article/details/105375822/
这篇文件中介绍了k8s中多个证书的作业,已经aggragator的启用方法.
# 开启步骤
1. 先生成一套aggregator的证书
# sudo apt install golang-cfssl # aggregator-ca-config cat > aggregator-ca-config.json <<EOF { "signing": { "default": { "expiry": "438000h" }, "profiles": { "aggregator": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "438000h" } } } } EOF # aggregator-ca-csr cat > aggregator-ca-csr.json <<EOF { "CN": "aggregator", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "CP", "O": "k8s", "OU": "System" } ], "ca": { "expiry": "87600h" } } EOF # 生成自签名CA证书 cfssl gencert -initca aggregator-ca-csr.json |cfssljson -bare aggregator-ca # 创建 aggregator证书申请配置 cat > aggregator-csr.json <<EOF { "CN": "aggregator", "hosts": [ "127.0.0.1", "10.120.0.1", "192.168.99.106", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster.local", "paas-106" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "CP", "O": "k8s", "OU": "System" } ] } EOF # 生成 aggregator证书和key cfssl gencert -ca=aggregator-ca.pem -ca-key=aggregator-ca-key.pem -config=aggregator-ca-config.json -profile=aggregator aggregator-csr.json |cfssljson -bare aggregator
# 默认使用kubeadm部署是没有 aggregator的配置的,需要添加 $ sudo cp /etc/kubernetes/manifests/kube-apiserver.yaml . $ sudo vim kube-apiserver.yaml .... - --enable-aggregator-routing=true #因为kube-apiserver和kube-proxy在多节点时,肯定有不在一起的,所以加上它. - --requestheader-client-ca-file=/etc/kubernetes/pki/aggregator-ca.pem - --requestheader-allowed-names=aggregator - --proxy-client-cert-file=/etc/kubernetes/pki/aggregator.pem - --proxy-client-key-file=/etc/kubernetes/pki/aggregator-key.pem .... # 最后将修改完成的,在复制回去;只有manifests目录中的文件发生改变,就会自动重启相关pod sudo cp kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml # 等一会儿后,kube-apiserver, kube-controller-manager, kube-scheduler都重启完成就可用了。
3. 第三个容易忽略的点
# 在编辑 HPA 配置时需要注意下面提示
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: render-hpa-sharegpu namespace: render-hpa spec: maxReplicas: 3 minReplicas: 1 metrics: - resource: name: memory target: averageUtilization: 40 type: Utilization type: Resource - resource: name: cpu target: averageUtilization: 30 type: Utilization type: Resource scaleTargetRef: apiVersion: apps/v1 #这里写时,要和使用的控制器所在群组一致 kind: Deployment name: render-hpa-sharegpu