代码改变世界

HAProxy、Keepalived 在 Ocatvia 的应用实现与分析

2018-11-17 16:30  云物互联  阅读(629)  评论(0编辑  收藏  举报

目录

Amphora

创建一个 loadbalancer 需要占用一到两台 Amphora Instance 作为 “负载均衡器“ 的运行载体,实际提供高可用负载均衡底层支撑是 HAProxy & Keepalived。

  • HAProxy:L4-L7 负载均衡器
  • Keepalived:Linux 体系的高可用解决方案

不过 Amphora 并非是一开始就运行着 haproxy 和 keepalived 服务进程的,而是在需要运行它们的时候才会被 amphora-agent 启动。

启动 keepalived

在这里插入图片描述

keepalived 服务进程在 Amphora 被 loadbalancer 纳管后启动,TASK:AmphoraVRRPStart 就是启动服务的逻辑实现,而且从 UML 图可见,只有当 loadbalancer_topology = ACTIVE_STANDBY 时才会加载 keepalived,提供高可用服务。

# file: /opt/rocky/octavia/octavia/controller/worker/tasks/amphora_driver_tasks.py

class AmphoraVRRPStart(BaseAmphoraTask):
    """Task to start keepalived of all amphorae of a LB."""

    def execute(self, loadbalancer):
        self.amphora_driver.start_vrrp_service(loadbalancer)
        LOG.debug("Started VRRP of loadbalancer %s amphorae",
                  loadbalancer.id)

进过一系列调用后最终由 AmphoraAPIClient 发出 PUT vrrp/start 请求到 amphora-agent 的 view_func:manage_service_vrrp 接收处理。

# file: /opt/rocky/octavia/octavia/amphorae/backends/agent/api_server/keepalived.py

    def manager_keepalived_service(self, action):
        action = action.lower()
        if action not in [consts.AMP_ACTION_START,
                          consts.AMP_ACTION_STOP,
                          consts.AMP_ACTION_RELOAD]:
            return webob.Response(json=dict(
                message='Invalid Request',
                details="Unknown action: {0}".format(action)), status=400)

        if action == consts.AMP_ACTION_START:
            keepalived_pid_path = util.keepalived_pid_path()
            try:
                # Is there a pid file for keepalived?
                with open(keepalived_pid_path, 'r') as pid_file:
                    pid = int(pid_file.readline())
                os.kill(pid, 0)

                # If we got here, it means the keepalived process is running.
                # We should reload it instead of trying to start it again.
                action = consts.AMP_ACTION_RELOAD
            except (IOError, OSError):
                pass

        cmd = ("/usr/sbin/service octavia-keepalived {action}".format(
            action=action))

        try:
            subprocess.check_output(cmd.split(), stderr=subprocess.STDOUT)
        except subprocess.CalledProcessError as e:
            LOG.debug('Failed to %s octavia-keepalived service: %s %s',
                      action, e, e.output)
            return webob.Response(json=dict(
                message="Failed to {0} octavia-keepalived service".format(
                    action), details=e.output), status=500)

        return webob.Response(
            json=dict(message='OK',
                      details='keepalived {action}ed'.format(action=action)),
            status=202)

上述代码可知,Amphora 中的 amphora-agent 是通过执行 CLI /usr/sbin/service octavia-keepalived start 来启动 keepalived 的。

# file: /usr/lib/systemd/system/octavia-keepalived.service

[Unit]
Description=Keepalive Daemon (LVS and VRRP)
After=network-online.target .service
Wants=network-online.target
Requires=.service

[Service]
# Force context as we start keepalived under "ip netns exec"
SELinuxContext=system_u:system_r:keepalived_t:s0
Type=forking
KillMode=process

ExecStart=/sbin/ip netns exec amphora-haproxy /usr/sbin/keepalived  -D -d -f /var/lib/octavia/vrrp/octavia-keepalived.conf -p /var/lib/octavia/vrrp/octavia-keepalived.pid

ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/lib/octavia/vrrp/octavia-keepalived.pid

[Install]
WantedBy=multi-user.target

octavia-keepalived.service 定义了 keepalived 的启动脚本、配置文件以 PID 文件的路径。配置文件的内容如下:

# file: /var/lib/octavia/vrrp/octavia-keepalived.conf

vrrp_script check_script {
  script /var/lib/octavia/vrrp/check_script.sh
  interval 5
  fall 2
  rise 2
}

vrrp_instance 01197be798d5440da846cd70f52dc503 { # VRRP instance name is loadbalancer UUID
  state MASTER                                   # Master router
  interface eth1                                 # VRRP IP device
  virtual_router_id 1                            # VRID
  priority 100
  nopreempt
  garp_master_refresh 5
  garp_master_refresh_repeat 2
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass b76d77e
  }

  unicast_src_ip 172.16.1.3                      # VRRP IP
  unicast_peer {
    172.16.1.7                                   # Backup router VRRP IP
  }

  virtual_ipaddress {
    172.16.1.10                                  # VIP address
  }
  track_script {
    check_script
  }
}

从配置文件可知 keepalived 使用 NIC eth1 作为 VRRP IP 和 VIP 的 interface,但是直接在 Amphora 执行 ifconfig 是看不见 eth1 的。因为 Amphora 将 VIP 设置到了 namespace amphora-haproxy 中:

root@amphora-cd444019-ce8f-4f89-be6b-0edf76f41b77:~# ip netns
amphora-haproxy
root@amphora-cd444019-ce8f-4f89-be6b-0edf76f41b77:~# ip netns exec amphora-haproxy bash
root@amphora-cd444019-ce8f-4f89-be6b-0edf76f41b77:~# ifconfig
eth1      Link encap:Ethernet  HWaddr fa:16:3e:f4:69:4b
          inet addr:172.16.1.3  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fef4:694b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:648 (648.0 B)

eth1:0    Link encap:Ethernet  HWaddr fa:16:3e:f4:69:4b
          inet addr:172.16.1.10  Bcast:172.16.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1

除了 start 之外,view_func:manage_service_vrrp 还支持 stop 和 reload 操作。而 keepalived 配置文件的更新则交由 view_func:upload_keepalived_config 来完成。

而在 octavia-worker 端,更新 keepalived 配置文件的逻辑实现在 Task:AmphoraVRRPUpdate,AmphoraVRRPUpdate 在 AmphoraVRRPStart 之前执行,配置文件通渲染 Jinja 模板的方式生成。

启动 haproxy

我们知道 HAProxy 负责监听 frontend 的请求,然后根据不同的条件和 ACL 规则将请求分发到 backend,这一特性正是 Octavia Listener 对象的定义。所以,当为 loadbalancer 创建 listener 时才会启动 haproxy 服务进程。

在这里插入图片描述

从 UML 可知,执行指令 openstack loadbalancer listener create --protocol HTTP --protocol-port 8080 lb-1 创建 Listener 时会执行到 Task:ListenersUpdate,由 ListenersUpdate 完成了 haproxy 配置文件的 Upload 和 haproxy 服务进程的 Reload。

配置文件 /var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557/haproxy.cfg,其中 1385d3c4-615e-4a92-aea1-c4fa51a75557 为 Listener UUID:

# Configuration for loadbalancer 01197be7-98d5-440d-a846-cd70f52dc503
global
    daemon
    user nobody
    log /dev/log local0
    log /dev/log local1 notice
    stats socket /var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557.sock mode 0666 level user
    maxconn 1000000

defaults
    log global
    retries 3
    option redispatch

peers 1385d3c4615e4a92aea1c4fa51a75557_peers
    peer l_Ustq0qE-h-_Q1dlXLXBAiWR8U 172.16.1.7:1025
    peer O08zAgUhIv9TEXhyYZf2iHdxOkA 172.16.1.3:1025


frontend 1385d3c4-615e-4a92-aea1-c4fa51a75557
    option httplog
    maxconn 1000000
    bind 172.16.1.10:8080
    mode http
    timeout client 50000

因为此时的 Listener 只指定了监听的协议和端口,所以 frontend section 也设置了相应的 bind 172.16.1.10:8080mode http

服务进程:systemctl status haproxy-1385d3c4-615e-4a92-aea1-c4fa51a75557.service 其中 1385d3c4-615e-4a92-aea1-c4fa51a75557 为 Listener UUID:

# file: /usr/lib/systemd/system/haproxy-1385d3c4-615e-4a92-aea1-c4fa51a75557.service

[Unit]
Description=HAProxy Load Balancer
After=network.target syslog.service amphora-netns.service
Before=octavia-keepalived.service
Wants=syslog.service
Requires=amphora-netns.service

[Service]
# Force context as we start haproxy under "ip netns exec"
SELinuxContext=system_u:system_r:haproxy_t:s0

Environment="CONFIG=/var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557/haproxy.cfg" "USERCONFIG=/var/lib/octavia/haproxy-default-user-group.conf" "PIDFILE=/var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557/1385d3c4-615e-4a92-aea1-c4fa51a75557.pid"

ExecStartPre=/usr/sbin/haproxy -f $CONFIG -f $USERCONFIG -c -q -L O08zAgUhIv9TEXhyYZf2iHdxOkA

ExecReload=/usr/sbin/haproxy -c -f $CONFIG -f $USERCONFIG -L O08zAgUhIv9TEXhyYZf2iHdxOkA
ExecReload=/bin/kill -USR2 $MAINPID

ExecStart=/sbin/ip netns exec amphora-haproxy /usr/sbin/haproxy-systemd-wrapper -f $CONFIG -f $USERCONFIG -p $PIDFILE -L O08zAgUhIv9TEXhyYZf2iHdxOkA

KillMode=mixed
Restart=always
LimitNOFILE=2097152

[Install]
WantedBy=multi-user.target

从服务进程配置可以看出实际启动的服务为 /usr/sbin/haproxy-systemd-wrapper,它是运行在 namespace amphora-haproxy 中的,该脚本做的事情可以从日志了解到:

Nov 15 10:12:01 amphora-cd444019-ce8f-4f89-be6b-0edf76f41b77 ip[13206]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557/1385d3c4-615e-4a92-aea1-c4fa51a75557.pid -L O08zAgUhIv9TEXhyYZf2iHdxOkA -Ds

就是调用了 /usr/sbin/haproxy 指令而已。

最后

本篇介绍了 Octavia 是如何将 HAProxy、Keepalived 等常用的负载均衡解决方案封装到 Amphora Instance 的,同时也介绍了 Create Listener 所需要处理的事情。需要注意的是 HAProxy 的配置文件会随着 Listener、Pool、Member、L7policy、L7rule、health-monitor 等对象的变更而变更,这些我们以后再作讨论。还有一点补充的就是创建 Listener 会执行 Task:UpdateVIP,这是因为 Lisenter 含有的协议及端口信息都需要被更新到 VIP 的安全组规则中,否则 Listener 要如何监听得到传输层的数据包呢?