「Bug」Istio 的 Sidercar 和 IngressGateway 间歇性地报错:Envoy proxy is NOT ready
Bug 描述
IngressGateway 日志如下:
ryan@RYAN-MI-DESKTOP:~$ kubectl -n istio-system logs istio-ingressgateway-7f85d5ffcc-f4g7l
2020-04-22T01:09:24.691767Z info FLAG: --binaryPath="/usr/local/bin/envoy"
2020-04-22T01:09:24.691874Z info FLAG: --concurrency="0"
2020-04-22T01:09:24.691889Z info FLAG: --configPath="/etc/istio/proxy"
2020-04-22T01:09:24.691904Z info FLAG: --connectTimeout="10s"
2020-04-22T01:09:24.692020Z info FLAG: --controlPlaneAuthPolicy="NONE"
2020-04-22T01:09:24.692047Z info FLAG: --controlPlaneBootstrap="true"
2020-04-22T01:09:24.692054Z info FLAG: --customConfigFile=""
2020-04-22T01:09:24.692059Z info FLAG: --datadogAgentAddress=""
2020-04-22T01:09:24.692064Z info FLAG: --disableInternalTelemetry="false"
2020-04-22T01:09:24.692070Z info FLAG: --discoveryAddress="istio-pilot.istio-system.svc:15012"
2020-04-22T01:09:24.692075Z info FLAG: --dnsRefreshRate="300s"
2020-04-22T01:09:24.692081Z info FLAG: --domain="istio-system.svc.cluster.local"
2020-04-22T01:09:24.692087Z info FLAG: --drainDuration="45s"
2020-04-22T01:09:24.692092Z info FLAG: --envoyAccessLogService=""
2020-04-22T01:09:24.692096Z info FLAG: --envoyMetricsService=""
2020-04-22T01:09:24.692101Z info FLAG: --help="false"
2020-04-22T01:09:24.692106Z info FLAG: --id=""
2020-04-22T01:09:24.692111Z info FLAG: --ip=""
2020-04-22T01:09:24.692116Z info FLAG: --lightstepAccessToken=""
2020-04-22T01:09:24.692120Z info FLAG: --lightstepAddress=""
2020-04-22T01:09:24.692125Z info FLAG: --lightstepCacertPath=""
2020-04-22T01:09:24.692130Z info FLAG: --lightstepSecure="false"
2020-04-22T01:09:24.692135Z info FLAG: --log_as_json="false"
2020-04-22T01:09:24.692140Z info FLAG: --log_caller=""
2020-04-22T01:09:24.692145Z info FLAG: --log_output_level="default:info"
2020-04-22T01:09:24.692149Z info FLAG: --log_rotate=""
2020-04-22T01:09:24.692155Z info FLAG: --log_rotate_max_age="30"
2020-04-22T01:09:24.692160Z info FLAG: --log_rotate_max_backups="1000"
2020-04-22T01:09:24.692165Z info FLAG: --log_rotate_max_size="104857600"
2020-04-22T01:09:24.692170Z info FLAG: --log_stacktrace_level="default:none"
2020-04-22T01:09:24.692181Z info FLAG: --log_target="[stdout]"
2020-04-22T01:09:24.692186Z info FLAG: --mixerIdentity=""
2020-04-22T01:09:24.692191Z info FLAG: --outlierLogPath=""
2020-04-22T01:09:24.692197Z info FLAG: --parentShutdownDuration="1m0s"
2020-04-22T01:09:24.692201Z info FLAG: --pilotIdentity=""
2020-04-22T01:09:24.692208Z info FLAG: --proxyAdminPort="15000"
2020-04-22T01:09:24.692213Z info FLAG: --proxyComponentLogLevel="misc:error"
2020-04-22T01:09:24.692218Z info FLAG: --proxyLogLevel="warning"
2020-04-22T01:09:24.692224Z info FLAG: --serviceCluster="istio-ingressgateway"
2020-04-22T01:09:24.692230Z info FLAG: --serviceregistry="Kubernetes"
2020-04-22T01:09:24.692234Z info FLAG: --statsdUdpAddress=""
2020-04-22T01:09:24.692240Z info FLAG: --statusPort="15020"
2020-04-22T01:09:24.692245Z info FLAG: --stsPort="0"
2020-04-22T01:09:24.692250Z info FLAG: --templateFile=""
2020-04-22T01:09:24.692255Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2020-04-22T01:09:24.692260Z info FLAG: --trust-domain="cluster.local"
2020-04-22T01:09:24.692265Z info FLAG: --zipkinAddress="zipkin.istio-system:9411"
2020-04-22T01:09:24.692311Z info Version 1.5.0-c3c353285578eb68b334fc8766746b754b6b3789-Clean
2020-04-22T01:09:24.693194Z info Obtained private IP [100.82.150.181]
2020-04-22T01:09:24.693320Z info Proxy role: &model.Proxy{ClusterID:"", Type:"router", IPAddresses:[]string{"100.82.150.181", "100.82.150.181"}, ID:"istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system", Locality:(*envoy_api_v2_core.Locality)(nil), DNSDomain:"istio-system.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), WorkloadLabels:labels.Collection(nil), IstioVersion:(*model.IstioVersion)(nil)}
2020-04-22T01:09:24.693356Z info PilotSAN []string(nil)
2020-04-22T01:09:24.693370Z info MixerSAN []string(nil)
2020-04-22T01:09:24.695137Z info Effective config: binaryPath: /usr/local/bin/envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: istio-ingressgateway
statNameLength: 189
tracing:
zipkin:
address: zipkin.istio-system:9411
2020-04-22T01:09:24.695190Z info JWT policy is first-party-jwt
2020-04-22T01:09:24.695693Z info Using user-configured CA istio-pilot.istio-system.svc:15012
2020-04-22T01:09:24.695722Z info istiod uses self-issued certificate
2020-04-22T01:09:24.695907Z info the CA cert of istiod is: -----BEGIN CERTIFICATE-----
MIIC3TCCAcWgAwIBAgIQJgBrERl3Dow3k42JaJOG4TANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIwMDQxOTA2NDgzN1oXDTMwMDQx
......
s6cBajEjOj2eqwrwoSYYvICrQGNyrLEzjez1E4sjuXaDiKYDPS8rC35J62jwEiOn
nYwCjfi+/xwxukIp8BifwrQ=
-----END CERTIFICATE-----
2020-04-22T01:09:24.696850Z info parsed scheme: ""
2020-04-22T01:09:24.696895Z info scheme "" not registered, fallback to default scheme
2020-04-22T01:09:24.696956Z info ccResolverWrapper: sending update to cc: {[{istio-pilot.istio-system.svc:15012 <nil> 0 <nil>}] <nil> <nil>}
2020-04-22T01:09:24.696997Z info ClientConn switching balancer to "pick_first"
2020-04-22T01:09:24.697302Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {CONNECTING <nil>}
2020-04-22T01:09:24.736769Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {READY <nil>}
2020-04-22T01:09:24.872608Z info Starting gateway SDS
2020-04-22T01:09:24.997481Z info sds SDS gRPC server for workload UDS starts, listening on "/etc/istio/proxy/SDS"
2020-04-22T01:09:24.997920Z info sds SDS gRPC server for ingress gateway controller starts, listening on "/var/run/ingress_gateway/sds"
2020-04-22T01:09:24.998010Z info PilotSAN []string{"istio-pilot.istio-system.svc"}
2020-04-22T01:09:24.998190Z info Starting proxy agent
2020-04-22T01:09:24.998494Z info sds Start SDS grpc server
2020-04-22T01:09:24.998681Z info Opening status port 15020
2020-04-22T01:09:24.998806Z info sds Start SDS grpc server for ingress gateway proxy
2020-04-22T01:09:24.998897Z info Received new config, creating new Envoy epoch 0
2020-04-22T01:09:24.998981Z info Epoch 0 starting
2020-04-22T01:09:25.044273Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l warning --component-log-level misc:error]
[Envoy (Epoch 0)] [2020-04-22 01:09:25.199][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2020-04-22 01:09:25.200][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-04-22T01:09:25.231736Z info sds node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-1 resource:default new connection
2020-04-22T01:09:25.784256Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[Envoy (Epoch 0)] [2020-04-22 01:09:26.010][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2020-04-22 01:09:26.011][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-04-22T01:09:26.060376Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-04-22T01:09:26.060838Z info sds node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-1 resource:default pushed key/cert pair to proxy
2020-04-22T01:09:26.060881Z info sds node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-1 resource:default pushed secret
2020-04-22T01:09:27.765512Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-22T01:09:29.766995Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-22T01:09:32.764881Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:34.769088Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:36.765021Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:38.767440Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:40.766358Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:42.769452Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:44.765521Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:46.764808Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:48.772710Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:50.768864Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:52.766023Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:54.774756Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:56.773031Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:58.765617Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:00.768655Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:02.441087Z info sds node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-2 resource:ROOTCA new connection
2020-04-22T01:10:02.441485Z info sds node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-2 resource:ROOTCA pushed root cert to proxy
2020-04-22T01:10:02.441513Z info sds node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-2 resource:ROOTCA pushed secret
2020-04-22T01:10:02.765271Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:04.765566Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:06.773683Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:08.770177Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:10.766355Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:12.764962Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:14.775662Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:16.774207Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:18.768873Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:20.776176Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:22.769696Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:24.765587Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:26.776204Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:28.767088Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:30.785644Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:32.765494Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:34.764348Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:36.765735Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:38.764585Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:39.925673Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 3 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-22T01:10:42.770012Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:44.768852Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:46.767273Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:48.765688Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:50.783984Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:52.916101Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:54.254978Z info Envoy proxy is ready
2020-04-22T01:10:58.773283Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:00.780969Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:02.769031Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:04.767144Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:06.770441Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:08.771764Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:10.764956Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:12.766110Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:13.864978Z info Envoy proxy is ready
2020-04-22T01:11:32.773066Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:34.766067Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:36.766066Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:38.775108Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:40.766046Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:42.770071Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:44.764804Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:45.893578Z info Envoy proxy is ready
2020-04-22T01:12:12.765846Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:14.769021Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:16.765763Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:18.780407Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:20.766747Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:22.766172Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:24.770132Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:26.773160Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:28.766725Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:30.768249Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:32.097014Z info Envoy proxy is ready
2020-04-22T01:12:48.769664Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:50.765486Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:52.767476Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:54.767704Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:56.767033Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:58.766685Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:00.768763Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:02.764860Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:04.766701Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:06.781599Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:08.769040Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:10.046523Z info Envoy proxy is ready
2020-04-22T01:36:29.588987Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-04-22T01:36:29.589216Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {CONNECTING <nil>}
2020-04-22T01:36:29.611300Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {READY <nil>}
[Envoy (Epoch 0)] [2020-04-22 01:37:10.305][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-04-22T01:37:14.765362Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:16.764682Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:18.765650Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:20.765993Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:22.764939Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:24.765619Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:26.765554Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:28.766080Z info Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:30.031610Z info Envoy proxy is ready
IngressGateway 间歇性报错:Envoy proxy is NOT ready,最后因为 Readiness 探针多次失败,被 Kill 掉。
排查流程
首先怀疑是节点性能问题,因为我们通过 nodeSelector 将 Istio 限制到了两个 8c/16G 的网络专用节点上,HPA 的扩容可能导致节点性能问题。
将 HPA 的副本上下限都调成了 2,就是固定两个副本。调整后情况有所改善,不再被 OOMKilled,但是隔几十分钟还是会报一次 Envoy proxy is NOT ready
查看 Pod 信息:
ryan@RYAN-MI-DESKTOP:~$ kubectl -n istio-system describe pod istio-ingressgateway-7f85d5ffcc-f4g7l
Name: istio-ingressgateway-7f85d5ffcc-f4g7l
Namespace: istio-system
Priority: 0
# ...... 省略若干信息
State: Running
Started: Wed, 22 Apr 2020 09:09:24 +0800
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Wed, 22 Apr 2020 08:27:03 +0800
Finished: Wed, 22 Apr 2020 09:09:22 +0800
Ready: True
Restart Count: 3
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
# ...... 省略
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 25m (x663 over 119m) kubelet, 192-168-1-93 Readiness probe failed: Get http://100.82.150.181:15020/healthz/ready: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
从上面的日志上看,Pod 因为长时间 Not Ready 而被重启了三次,此外这个 Pod 还触发过 OOMKilled。
这个问题连续出了一周的时间,一直没找到问题。还好是用于测试开发的集群,勉强也能用。
在 istio 上新提了 issue:IngressGateway's stats api is intermittently abnormal: failed to get server info: Get http://127.0.0.1:15000/stats
根据官方的回应,很可能是因为我们的 virtualservice 等配置过多(2000+),导致 envoy 主线程繁忙。
解决方法是每个名字空间都需要配一个专用 envoy 入口网关,让 envoy 只监听一个命名空间的配置,减小压力。
可这样的话,在最外层还得加一层路由(比如 ingress),将流量路由到各名字空间的 envoy 入口网关去。
开发集群是否有必要做这么多层路由?待观察。
=============
更新:我们升级到 istio 1.5 后,这个问题得到了缓解,基本没再出过 IngressGateway NodeReady 的问题。
并不需要为每个名字空间配一个专用入口网关。