** 本文基于 OpenShift 3.11，Kubernetes 1.11 进行测试 ***

1. OpenShift 为什么需要 Router 和 Route？

顾名思义，Router 是路由器，Route 是路由器中配置的路由。OpenShift 中的这两个概念是为了解决从集群外部（就是从除了集群节点以外的其它地方）访问服务的需求。不晓得为什么OpenShift 要将Kubernetes 中的 Ingress 改为 Router，我倒是觉得 Ingress 名字更贴切。

从外部通过 router 和从内部通过 servide 访问 pod 中的应用两个过程的简单的示意图如下：

上图中，某个应用的三个pod 分别位于 node1，node2 和 node3 上。OpenShift 中有三层IP地址概念：

pod 自己的 IP 地址，可以类比为 OpenStack 中虚拟机的固定IP。它只有在集群内才有意义。
service 的 IP 地址。Service 通常有 ClusterIP，这也是一种集群内部的IP 地址。
应用的外部 IP 地址，可以类比为OpenStack 中的浮动IP，或者IDC IP（和浮动IP 之间是NAT 映射关系）。

因此，要从集群外部访问 pod 中的应用，无非两种方式：

一种是利用一个代理（proxy），把外部 IP 地址转化为后端的 Pod IP 地址。这就是 OpenShift router/route 的思路。OpenShift 中的 router 服务，是一个运行在特定节点（通常是基础架构节点）上的集群基础服务，由集群管理员负责创建和管理。它可以有多个副本（pod）。router 中可有多个 route，每个 route 能通过外部HTTP 请求的域名找出其后端的 pod 列表，并进行网络包的转发。也就是将pod 中的应用暴露到外网域名，使得用户可以外面通过域名访问到应用。这实际上是一种七层负载均衡器。OpenShift 默认采用 HAProxy 来实现，当然也支持其它实现，比如 F5.
另一种是将服务直接暴露到集群外。这种方式具体会在『服务 Service』那一篇文章中详细解释。

2. OpenShift 如何利用 HAProxy 实现 router 和 route？

2.1 Router 部署

使用 ansible 采用默认配置部署 OpenShift 集群时，在集群 Infra 节点上，会以 Host networking 方式运行一个 HAProxy 的 pod，它会在所有网卡的 80 和 443 端口上进行监听。

[root@infra-node3 cloud-user]# netstat -lntp | grep haproxy
tcp        0      0 127.0.0.1:10443         0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 127.0.0.1:10444         0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      583/haproxy

其中，172.0.0.1 上的 10443 和 10444 是HAproxy 自己使用的。下文会有解释。

因此，在每个 infra 节点上，只能有一个 HAProxy pod，因为这些端口只能被占用一次。如果调度器找不到满足要求的节点，则router 服务的调度就会失败：

0/7 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 5 node(s) didn't match node selector

OpenShift HAProxy Router 支持两种部署方式：

一种是常见的单Router 服务部署，它有一个或多个实例（pod），分布在多个节点上，负责整个集群上部署的服务的对外访问。
另一种是分片（sharding）部署。此时，会有多个 Router 服务，每个Router 服务负责指定的若干project，两者之间采用标签（label）进行映射。这是为了解决单个 Router 的性能不够问题而提出的解决方案。

OpenShift 提供了 oc adm router 命令来创建 router 服务。

创建router：

[root@master1 cloud-user]# oc adm router router2 --replicas=1 --service-account=router
info: password for stats user admin has been set to J3YyPjlbqf
--> Creating router router2 ...
    warning: serviceaccounts "router" already exists
    clusterrolebinding.authorization.openshift.io "router-router2-role" created
    deploymentconfig.apps.openshift.io "router2" created
    service "router2" created
--> Success

详细的部署方法请参见官方文档 https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html。

2.2 Router pod 中的 HAProxy 进程

在 Router 服务的每个 pod 之中，openshift-router 进程启动了一个 haproy 进程：

UID        PID  PPID  C STIME TTY          TIME CMD
1000000+     1     0  0 Nov21 ?        00:14:27 /usr/bin/openshift-router
1000000+ 16011     1  0 12:42 ?        00:00:00 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf 16004

查看 haproxy 使用的配置文件（只是部分）：

global
  maxconn 20000
  daemon
  ca-base /etc/ssl
  crt-base /etc/ssl
 。。。。  

defaults
  maxconn 20000

  # Add x-forwarded-for header.

  # server openshift_backend 127.0.0.1:8080
  errorfile 503 /var/lib/haproxy/conf/error-page-503.http

。。。
  timeout http-request 10s
  timeout http-keep-alive 300s

  # Long timeout for WebSocket connections.
  timeout tunnel 1h

frontend public
    
  bind :80
  mode http
  tcp-request inspect-delay 5s
  tcp-request content accept if HTTP
  monitor-uri /_______internal_router_healthz

  # Strip off Proxy headers to prevent HTTpoxy (https://httpoxy.org/)
  http-request del-header Proxy

  # DNS labels are case insensitive (RFC 4343), we need to convert the hostname into lowercase
  # before matching, or any requests containing uppercase characters will never match.
  http-request set-header Host %[req.hdr(Host),lower]

  # check if we need to redirect/force using https.
  acl secure_redirect base,map_reg(/var/lib/haproxy/conf/os_route_http_redirect.map) -m found
  redirect scheme https if secure_redirect

  use_backend %[base,map_reg(/var/lib/haproxy/conf/os_http_be.map)]

  default_backend openshift_default

# public ssl accepts all connections and isn't checking certificates yet certificates to use will be
# determined by the next backend in the chain which may be an app backend (passthrough termination) or a backend
# that terminates encryption in this router (edge)
frontend public_ssl
    
  bind :443
  tcp-request  inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # if the connection is SNI and the route is a passthrough don't use the termination backend, just use the tcp backend
  # for the SNI case, we also need to compare it in case-insensitive mode (by converting it to lowercase) as RFC 4343 says
  acl sni req.ssl_sni -m found
  acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found
  use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough

  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

  # non SNI requests should enter a default termination backend rather than the custom cert SNI backend since it
  # will not be able to match a cert to an SNI host
  default_backend be_no_sni

。。。

backend be_edge_http:demoprojectone:jenkins
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  4m

  timeout check 5000ms
  http-request set-header X-Forwarded-Host %[req.hdr(host)]
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)];proto-version=%[req.hdr(X-Forwarded-Proto-Version)]
  cookie 4376ea64d7d0abf11209cfe5f7cca1e7 insert indirect nocache httponly secure
  server pod:jenkins-1-84nrt:jenkins:10.128.2.13:8080 10.128.2.13:8080 cookie 8669a19afc9f0fed6824feb9fb1cf4ac weight 256

。。。

为了简单期间，上面只是配置文件的部分内容，它主要包括三种类型：

全局配置，比如最大连接数 maxconn，超时时间 timeout 等；以及front部分，即前端配置，HAProxy 默认会在 443 和 80 两个端口上分别监听外部 https 和 http 请求。
backend，即每个服务的后端配置，里面有很多关键内容，比如后端协议（mode）、负载均衡方法（balance）、后端列表（server，这里是pod，包括其IP 地址和端口）、证书等。

因此，OpenShift 的路由器功能需要能对这三部分进行管理和控制。

关于负载均衡器和 HAProxy 的详细介绍，可以参考 Neutron 理解 (7): Neutron 是如何实现负载均衡器虚拟化的这篇文章。

2.3 全局配置管理

要指定或修改 HAProxy 的全局配置，OpenShift 有提供两种方式：

（1）第一种是使用 oc adm router 命令在创建 router 时候指定各种参数，比如 --max-connections 用于设置最大连接数。比如：

oc adm router --max-connections=200000 --ports='81:80,444:443' router3

创建出来的HAProxy 的 maxconn 将是 20000，router3 这个服务对外暴露出来的端口是 81 和 444，但是 HAProxy pod 的端口依然是 80 和 443.

（2）通过设置 dc/<dc router名> 的环境变量来设置 router 的全局配置。

在官方文档 https://docs.openshift.com/container-platform/3.4/architecture/core_concepts/routes.html#haproxy-template-router 中有完整的环境变量列表。比如运行以下命令后，

 oc set env dc/router3 ROUTER_SERVICE_HTTPS_PORT=444 ROUTER_SERVICE_HTTP_PORT=81 STATS_PORT=1937

router3 会重新部署，新部署的HAProxy 的 https 监听端口是 444，http 监听端口是 80，统计端口是 1937.

2.4 OpenShift passthrough 类型的 route 与 HAProxy backend

（1）通过OpenShift Console 或者 oc 命令创建一条 route，它将 sit 项目的 jenkins 服务暴露到域名 sitjenkins.com.cn：

在界面上创建 route：

结果：

Name:                   sitjenkins.com.cn
Namespace:              sit
Labels:                 app=jenkins-ephemeral
                        template=jenkins-ephemeral-template
Annotations:            <none>
Requested Host:         sitjenkins.com.cn
Path:                   <none>
TLS Termination:        passthrough
Endpoint Port:          web

Service:        jenkins
Weight:         100 (100%)
Endpoints:      10.128.2.15:8080, 10.131.0.10:8080

这里，service name 起了一个中介作用，把 route 和服务的端点（也就是pod）连接了起来。

（2）router 服务的两个 pod 中的 HAProxy 进程的配置文件中多了一个backend：

# Secure backend, pass through
backend be_tcp:sit:sitjenkins.com.cn
  balance source

  hash-type consistent
  timeout check 5000ms}
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 weight 256 check inter 5000ms
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 weight 256 check inter 5000ms

其中，这些后端 server 其实就是 pod，它们是 openshift 通过步骤（1）中的 service name 找到的。balance 是负载均衡策略，后文会解释。

（3）文件 /var/lib/haproxy/conf/os_sni_passthrough.map 中多了一条记录

sh-4.2$ cat /var/lib/haproxy/conf/os_sni_passthrough.map
^sitjenkins\.com\.cn(:[0-9]+)?(/.*)?$ 1

（4）文件 /var/lib/haproxy/conf/os_tcp_be.map 中多了一条记录

sh-4.2$ cat /var/lib/haproxy/conf/os_tcp_be.map
^sitjenkins\.com\.cn(:[0-9]+)?(/.*)?$ be_tcp:sit:sitjenkins.com.cn

（5）HAProxy 根据上面的 map 文件为该条 route 选择第（2）步中增加的 backend的逻辑如下

frontend public_ssl  #解释：前端协议 https，

  bind :443  ##前端端口 443
  tcp-request  inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # if the connection is SNI and the route is a passthrough don't use the termination backend, just use the tcp backend
  # for the SNI case, we also need to compare it in case-insensitive mode (by converting it to lowercase) as RFC 4343 says
  acl sni req.ssl_sni -m found ##检查 https request 支持 sni
  acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found ##检查通过 sni 传来的 hostname 在 os_sni_patthrough.map 文件中
  use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough ##从 oc_tcp_be.map 中根据 sni hostname 获取 backend name

  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

  # non SNI requests should enter a default termination backend rather than the custom cert SNI backend since it
  # will not be able to match a cert to an SNI host
  default_backend be_no_sni

（6）HAPorxy 进程会重启，从而应用修改了的配置文件。

理解（5）中的脚本需要的一些背景知识：

SNI：TLS Server Name Indication (SNI) ，这是 TLS 网络协议的一种扩展，会在 TLS 握手前由客户端（client）告知服务器端（server）它将会连接的域名（hostname），使得服务器端可以根据该hostname 向客户端段返回指定的证书，从而使得服务器端能够支持多个hostname 需要的多个证书。详情请参阅 https://en.wikipedia.org/wiki/Server_Name_Indication。
OpenShift passthrough route：这种 route 的 SSL 连接不会在 router 上被 TLS 终止（termination），而是router 会将 TLS 链接透传到后端。下文有解释。
HAProxy 对 SNI 的支持：HAProxy 会根据 SNI 的信息中的 hostname 去选择特定的 backend。详情请参阅 https://www.haproxy.com/blog/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/。
HAProxy ACL：详情请参阅 https://www.haproxy.com/documentation/aloha/10-0/traffic-management/lb-layer7/acls/

从上面的蓝色注释中，我们能看到 HAProxy 进程通过 https 请求中通过 SNI 传入的域名 sitjenkins.com.cn ，在 os_tcp_be.map 文件中获取到了 backend 名称 be_tcp:sit:sitjenkins.com.cn，这样就和（2）步骤中的 backend 对应上了。

OpenShift 的 router 使用的 HAProxy 采用基于域名的负载均衡路由方式，示例如下，具体说明请参加官方文档。

2.5 OpenShift edge 和 re-encrypt 类型的 route 与 HAProxy

HAProxy 前端：前端依然是在 443 端口监听外部 HTTPS 请求

frontend public_ssl
  bind :443
.....
  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

但是，当 TLS 终止类型不是 passthrough （edge 或者 re-encrypt）时，会使用backend be_sni。

backend be_sni
  server fe_sni 127.0.0.1:10444 weight 1 send-prox

而这个后端是由本机的 127.0.0.1:10444 提供服务，因此又转到了前端 fe_sni：

frontend fe_sni
  # terminate ssl on edge
  bind 127.0.0.1:10444 ssl no-sslv3 crt /var/lib/haproxy/router/certs/default.pem crt-list /var/lib/haproxy/conf/cert_config.map accept-proxy
  mode http
。。。。。。

  # map to backend
  # Search from most specific to general path (host case).
  # Note: If no match, haproxy uses the default_backend, no other
  #       use_backend directives below this will be processed.
  use_backend %[base,map_reg(/var/lib/haproxy/conf/os_edge_reencrypt_be.map)]

  default_backend openshift_default

map 映射文件：

sh-4.2$ cat /var/lib/haproxy/conf/os_edge_reencrypt_be.map
^edgejenkins\.com\.cn(:[0-9]+)?(/.*)?$ be_edge_http:sit:jenkins-edge

Edge 类型 route 的 HAProxy 后端：

backend be_edge_http:sit:jenkins-edge
  mode http
  option redispatch
  option forwardfor
  balance leastconn

  timeout check 5000ms
  .....
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 cookie 71c6bd03732fa7da2f1b497b1e4c7993 weight 256 check inter 5000ms
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 cookie fa8d7fb72a46958a7add1406e6d26cc8 weight 256 check inter 5000ms

Re-encrypt 类型 route 的 HAProxy 后端：

# Plain http backend or backend with TLS terminated at the edge or a
# secure backend with re-encryption.
backend be_secure:sit:reencryptjenkins.com.cn
  mode http
。。。。

http-request set-header X-Forwarded-Host %[req.hdr(host)]
http-request set-header X-Forwarded-Port %[dst_port]
http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }

  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 cookie ... weight 256 ssl verifyhost jenkins.sit.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt check inter 5000ms #与后端的链路采用 ssl 加密，并且要检查hostname
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 cookie ... weight 256 ssl verifyhost jenkins.sit.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt check inter 5000ms

这里可以看出来重新使用密钥对连接进行加密，但是不知道为何 mode 依然是 http，而不是 https。

2.6 设置和修改 route 配置

route 配置主要有以下几个比较重要的：

（1）SSL 终结方式。共三种：

edge：TLS 在 router 上被终结，然后非SSL网络包被转发给后端 pod。因此需要在 router 上安装 TLS 证书。不安装的话，会使用 router 的默认证书。
passthrough：加密网络包直接被发给 pod，router 上不做TLS 终结，因为不需要在 router 上配置证书或密钥。
Re-encryption：是 edge 的一种变种。首先 router 上会使用一个证书做 TSL 终结，然后使用另外的证书再进行加密，然后发给后端 pod。因此，整个网络路径都是加密的。

设置：

可以在创建 route 时设置，也可以通过修改 route 的 termination 配置项来修改其 SSL 终结方式。
具体请参考官方文档 https://docs.okd.io/latest/architecture/networking/routes.html#edge-termination

（2）负载均衡策略。也有三种：

roundrobin：根据权重轮流使用所有后端。
leastconn：选择最少连接的后端接收请求。
source：将源IP进行哈希，确保来自同一个源IP的请求发给同一个后端。

设置：

要修改整个 router 的负载均衡策略，可使用 ROUTER_TCP_BALANCE_SCHEME 环境变量，为该 router 的所有 passthrough 类型的 route设置负载均衡策略，使用 ROUTER_LOAD_BALANCE_ALGORITHM 为其它类型的 route 设置策略。
可以使用 haproxy.router.openshift.io/balance 为某个 route 设置负载均衡策略。

举例：

设置整个 router 的环境变量：oc set env dc/router ROUTER_TCP_BALANCE_SCHEME=roundrobin

改完以后，该 router 实例会重新部署，所有 passthrough 的 route 都是 roundrobin 类型的了。默认为 source 类型。

修改某个 route 的负载均衡的策略：oc edit route aaaa.svc.cluster.local

修改完成后，HAProxy 中对应该 route 的 backend 中的 balance 值会被修改为 leastconn。

2.7 一个 route 将流量分给多个后端服务

该功能常用于一些开发测试流程，比如做A/B 测试。

在下面的配置中，有一个应用三个版本的部署，前端一个 route，各服务使用不同的权重。

下面是 HAProxy 配置文件中的 backend 配置，采用 roundrobin 负载均衡模式：

3. OpenShift router 服务如何实现高可用？

OpenShift router 服务支持两种高可用模式。

3.1 单 router 服务多副本，并利用和DNS/LB 实现高可用

这种模式只部署一个 router 服务，它支持集群的所有对外暴露的服务。要实现HA，需要设置副本数（replicas）大于1，使得会在超过一台服务器上创建pod，然后再通过DNS轮询或者四层负载均衡。

因为 router/pod 中的 HAProxy 要实现本地配置文件，因此实际上它们是有状态容器。OpenShift 采用 etcd 作为配置的统一存储，openshift-router 进程应该是采取某种机制（被通知或定时拉取）从 etcd 中获取 router 和 route 的配置，然后再修改本地的配置文件，再重启 HAPorxy 进程来应用新修改了的配置文件。要深入了解这里面的工作原理，可以去看源代码。

因为master 上的服务也需要有LB（8443端口），router 服务也需要LB（80和443端口）。因此，要么采用两个LB：

（图片来源）

要么采用一个LB 来支持 master 上的服务和 router 服务：

（图片来源）

3.2 多 router 服务通过分片（sharding）实现高可用

这种模式下，管理员需要创建和部署多个 router 服务，每个router 服务支持一个或几个 project/namespace。router 和 project/namespace 之间的映射使用标签（label）来实现。具体的配置请参考官网 https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html。实际上，和一些产品（比如mysql，memedcache）的分片功能类似，该功能更多地是为了解决性能问题，而无法完全解决高可用问题。

4. 常见问题如何排查？

从上面的分析可以看出，要使得 router 和 route 都正常工作，至少要确保以下几个环节都是没问题的：

客户端使用 route 中配置的域名和端口来访问服务。
DNS 能将域名解析到目标 router 所在的服务器（在使用分片配置时比较复杂，尤其需要注意）。
如有采用另外的四层负载均衡器的话，它得配置正确、工作正常。
HAProxy 能通过域名匹配到正确的backend。
router 和 route 的配置被正确地反映到了 HAProxy 的配置文件中了。
HAProxy 进程重启了，从而读取了新修改的配置文件。
后端 pod 列表正确，并且至少有一个 pod 正常工作。

如果您看到如下的错误页面，则说明上面的第3到7点至少有一处不能正常功能。此时，进行有针对性的排查即可。

感谢您的阅读，欢迎关注我的微信公众号：

posted on 2018-11-26 15:19 SammyLiu 阅读(22452) 评论(0) 收藏举报

理解OpenShift（1）：网络之 Router 和 Route