Envoy超时重试

环境信息

### 环境说明
##### 四个Service:

- envoy:Front Proxy,地址为172.31.65.10
- 3个后端服务
  - service_blue:对应于Envoy中的blue_abort集群,带有abort故障注入配置,地址为172.31.65.5;
  - service_red:对应于Envoy中的red_delay集群,带有delay故障注入配置,地址为172.31.65.7;
  - service_green:对应于Envoy中的green集群,地址为172.31.65.6;

##### 使用的abort配置

```
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              max_active_faults: 100
              abort:
                http_status: 503
                percentage:
                  numerator: 50   # 为一半的请求注入中断故障,以便于在路由侧模拟重试的效果;
                  denominator: HUNDRED
```

##### 使用的delay配置

```
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              max_active_faults: 100
              delay:
                fixed_delay: 10s
                percentage:
                  numerator: 50     # 为一半的请求注入延迟故障,以便于在路由侧模拟超时的效果;
                  denominator: HUNDRED
```

##### 超时和重试相关的配置

```
            virtual_hosts:
            - name: backend
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/service/blue"
                route:
                  cluster: blue_abort
                  retry_policy:
                    retry_on: "5xx"   # 响应码为5xx时,则进行重试,重试最大次数为3次;
                    num_retries: 3
              - match:
                  prefix: "/service/red"
                route:
                  cluster: red_delay
                  timeout: 1s      # 超时时长为1秒,长于1秒,则执行超时操作;
              - match:
                  prefix: "/service/green"
                route:
                  cluster: green
              - match:
                  prefix: "/service/colors"
                route:
                  cluster: mycluster
                  retry_policy:    # 超时和重试策略同时使用; 
                    retry_on: "5xx"
                    num_retries: 3
                  timeout: 1s

启动

docker-compose up
# cat docker-compose.yaml 
version: '3.3'

services:
  envoy:
    image: envoyproxy/envoy-alpine:v1.21-latest
    environment:
      - ENVOY_UID=0
      - ENVOY_GID=0
    volumes:
    - ./front-envoy.yaml:/etc/envoy/envoy.yaml
    networks:
      envoymesh:
        ipv4_address: 172.31.65.10
        aliases:
        - front-proxy
    expose:
      # Expose ports 80 (for general traffic) and 9901 (for the admin server)
      - "80"
      - "9901"

  service_blue:
    image: ikubernetes/servicemesh-app:latest
    volumes:
      - ./service-envoy-fault-injection-abort.yaml:/etc/envoy/envoy.yaml
    networks:
      envoymesh:
        ipv4_address: 172.31.65.5
        aliases:
          - service_blue
          - colored
    environment:
      - SERVICE_NAME=blue
    expose:
      - "80"

  service_green:
    image: ikubernetes/servicemesh-app:latest
    networks:
      envoymesh:
        ipv4_address: 172.31.65.6
        aliases:
          - service_green
          - colored
    environment:
      - SERVICE_NAME=green
    expose:
      - "80"

  service_red:
    image: ikubernetes/servicemesh-app:latest
    volumes:
      - ./service-envoy-fault-injection-delay.yaml:/etc/envoy/envoy.yaml
    networks:
      envoymesh:
        ipv4_address: 172.31.65.7
        aliases:
          - service_red
          - colored
    environment:
      - SERVICE_NAME=red
    expose:
      - "80"
    
networks:
  envoymesh:
    driver: bridge
    ipam:
      config:
        - subnet: 172.31.65.0/24
# cat front-envoy.yaml 
admin:
  profile_path: /tmp/envoy.prof
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
       address: 0.0.0.0
       port_value: 9901

layered_runtime:
  layers:
  - name: admin
    admin_layer: {}
       
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 80 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/service/blue"
                route:
                  cluster: blue_abort
                  retry_policy:
                    retry_on: "5xx"
                    num_retries: 3                  
              - match:
                  prefix: "/service/red"
                route:
                  cluster: red_delay
                  timeout: 1s
              - match:
                  prefix: "/service/green"
                route:
                  cluster: green
              - match:
                  prefix: "/service/colors"
                route:
                  cluster: mycluster
                  retry_policy:
                    retry_on: "5xx"
                    num_retries: 3                  
                  timeout: 1s
          http_filters:
          - name: envoy.filters.http.router

  clusters:
  - name: red_delay
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: red_delay
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: service_red
                port_value: 80

  - name: blue_abort
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: blue_abort
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: service_blue
                port_value: 80

  - name: green
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: green
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: service_green
                port_value: 80

  - name: mycluster
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: mycluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: colored
                port_value: 80
# cat service-envoy-fault-injection-delay.yaml 
admin:
  profile_path: /tmp/envoy.prof
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
       address: 0.0.0.0
       port_value: 9901

layered_runtime:
  layers:
  - name: admin
    admin_layer: {}
       
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 80 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: local_service
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              max_active_faults: 100
              delay:
                fixed_delay: 10s
                percentage:
                  numerator: 50
                  denominator: HUNDRED
          - name: envoy.filters.http.router
            typed_config: {}

  clusters:
  - name: local_service
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: local_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080

# cat service-envoy-fault-injection-abort.yaml 
admin:
  profile_path: /tmp/envoy.prof
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
       address: 0.0.0.0
       port_value: 9901

layered_runtime:
  layers:
  - name: admin
    admin_layer: {}
       
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 80 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: local_service
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              max_active_faults: 100
              abort:
                http_status: 503
                percentage:
                  numerator: 50
                  denominator: HUNDRED
          - name: envoy.filters.http.router
            typed_config: {}

  clusters:
  - name: local_service
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: local_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080
                
# cat service-envoy.yaml 
admin:
  profile_path: /tmp/envoy.prof
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
       address: 0.0.0.0
       port_value: 9901

layered_runtime:
  layers:
  - name: admin
    admin_layer: {}
       
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 80 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: local_service
          http_filters:
          - name: envoy.filters.http.router

  clusters:
  - name: local_service
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: local_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080
# cat send-requests.sh 
#!/bin/bash
#
if [ $# -ne 2 ]
then
    echo "USAGE: $0 <URL> <COUNT>"
    exit 1;
fi

URL=$1
COUNT=$2
c=1
#interval="0.2"

while [[ ${c} -le ${COUNT} ]];
do
  #echo "Sending GET request: ${URL}"
  curl -o /dev/null -w '%{http_code}\n' -s ${URL}
  (( c++ ))
#  sleep $interval
done


# cat curl_format.txt 
    time_namelookup:  %{time_namelookup}\n
       time_connect:  %{time_connect}\n
    time_appconnect:  %{time_appconnect}\n
   time_pretransfer:  %{time_pretransfer}\n
      time_redirect:  %{time_redirect}\n
 time_starttransfer:  %{time_starttransfer}\n
                    ----------\n
         time_total:  %{time_total}\n

测试注入的delay故障

反复向/service/red发起多次请求,被注入延迟的请求,会有较长的响应时长

测试注入的abort故障

 ./send-requests.sh 172.31.65.10/service/blue 100

反复向/service/blue发起多次请求,后端被Envoy注入中断的请求,会因为响应的503响应码而触发自定义的
重试操作;最大3次的重试,仍有可能在连续多次的错误响应后,仍然响应以错误信息,但其比例会大大降低。

发往/service/green的请求,因后端无故障注入而几乎全部得到正常响应

发往/service/colors的请求,会被调度至red_delay、blue_abort和green三个集群,它们有的可能被延迟、有的可能被中断;

# 504响应码是由于上游请求超时所致

./send-requests.sh 172.31.65.10/service/colors 100 

posted @ 2022-08-08 17:16  Maniana  阅读(251)  评论(0编辑  收藏  举报