K8S Flannel容器集群网络部署

 

一、Docker网络解决方案 

Docker跨主机容器间网络通信实现的工具有Pipework、Flannel、Weave、Open vSwitch(虚拟交换机)、Calico, 其中Pipework、Weave、Flannel,三者的区别是:

1、Weave的思路

在每个宿主机上布置一个特殊的route的容器,不同宿主机的route容器连接起来。 route拦截所有普通容器的ip请求,并通过udp包发送到其他宿主机上的普通容器。这样在跨机的多个容器端看到的就是同一个扁平网络。 weave解决了网络问题,不过部署依然是单机的。 

2、Flannel的思路

Flannel是CoreOS团队针对Kubernetes设计的一个网络规划服务,简单来说,它的功能是让集群中的不同节点主机创建的Docker容器都具有全集群唯一的虚拟IP地址。但在默认的Docker配置中,每个节点上的Docker服务会分别负责所在节点容器的IP分配。这样导致的一个问题是,不同节点上容器可能获得相同的内外IP地址。并使这些容器之间能够之间通过IP地址相互找到,也就是相互ping通。Flannel设计目的就是为集群中所有节点重新规划IP地址的使用规则,从而使得不同节点上的容器能够获得"同属一个内网"且"不重复的"IP地址,并让属于不同节点上的容器能够直接通过内网IP通信。

Flannel实质上是一种"覆盖网络(overlay network)",即表示运行在一个网上的网(应用层网络),并不依靠ip地址来传递消息,而是采用一种映射机制,把ip地址和identifiers做映射来资源定位。也就是将TCP数据包装在另一种网络包里面进行路由转发和通信,目前已经支持UDP、VxLAN、AWS VPC和GCE路由等数据转发方式。

Flannel 使用etcd存储配置数据和子网分配信息。flannel 启动之后,后台进程首先检索配置和正在使用的子网列表,然后选择一个可用的子网,然后尝试去注册它。etcd也存储这个每个主机对应的ip。flannel 使用etcd的watch机制监视/coreos.com/network/subnets下面所有元素的变化信息,并且根据它来维护一个路由表。为了提高性能,flannel优化了Universal TAP/TUN设备,对TUN和UDP之间的ip分片做了代理。

Flannel工作原理
每个主机配置一个ip段和子网个数。例如,可以配置一个覆盖网络使用 10.1.0.0/16段,每个主机/24个子网。因此主机a可以接受10.1.15.0/24,主机B可以接受10.1.20.0/24的包。flannel使用etcd来维护分配的子网到实际的ip地址之间的映射。对于数据路径,flannel 使用udp来封装ip数据报,转发到远程主机。选择UDP作为转发协议是因为他能穿透防火墙。例如,AWS Classic无法转发IPoIP or GRE 网络包,是因为它的安全组仅仅支持TCP/UDP/ICMP。 Flannel工作原理流程图如下 (默认的节点间数据通信方式是UDP转发 flannel默认使用8285端口作为UDP封装报文的端口,VxLan使用8472端口)

对上图的简单说明 (Flannel的工作原理可以解释如下):
-> 数据从源容器中发出后,经由所在主机的docker0虚拟网卡转发到flannel0虚拟网卡,这是个P2P的虚拟网卡,flanneld服务监听在网卡的另外一端。
-> Flannel通过Etcd服务维护了一张节点间的路由表,该张表里保存了各个节点主机的子网网段信息。
-> 源主机的flanneld服务将原本的数据内容UDP封装后根据自己的路由表投递给目的节点的flanneld服务,数据到达以后被解包,然后直接进入目的节点的flannel0虚拟网卡,然后被转发到目的主机的docker0虚拟网卡,最后就像本机容器通信一样的由docker0路由到达目标容器。

这样整个数据包的传递就完成了,这里需要解释三个问题:
1) UDP封装是怎么回事?
在UDP的数据内容部分其实是另一个ICMP(也就是ping命令)的数据包。原始数据是在起始节点的Flannel服务上进行UDP封装的,投递到目的节点后就被另一端的Flannel服务
还原成了原始的数据包,两边的Docker服务都感觉不到这个过程的存在。

2) 为什么每个节点上的Docker会使用不同的IP地址段?
这个事情看起来很诡异,但真相十分简单。其实只是单纯的因为Flannel通过Etcd分配了每个节点可用的IP地址段后,偷偷的修改了Docker的启动参数。
在运行了Flannel服务的节点上可以查看到Docker服务进程运行参数(ps aux|grep docker|grep "bip"),例如“--bip=182.48.25.1/24”这个参数,它限制了所在节
点容器获得的IP范围。这个IP范围是由Flannel自动分配的,由Flannel通过保存在Etcd服务中的记录确保它们不会重复。

3) 为什么在发送节点上的数据会从docker0路由到flannel0虚拟网卡,在目的节点会从flannel0路由到docker0虚拟网卡?
例如现在有一个数据包要从IP为172.17.18.2的容器发到IP为172.17.46.2的容器。根据数据发送节点的路由表,它只与172.17.0.0/16匹配这条记录匹配,因此数据从docker0
出来以后就被投递到了flannel0。同理在目标节点,由于投递的地址是一个容器,因此目的地址一定会落在docker0对于的172.17.46.0/24这个记录上,自然的被投递到了docker0网卡。

 

3、pipework的思路

pipework是一个单机的工具,组合了brctl等工具,可以认为pipework解决的是宿主机上的设置容器的虚拟网卡、网桥、ip等,可以配合其他网络使用。

如果容器数量不多,想简单的组一个大的3层网络,可以考虑weave
如果容器数量很多,而且你们的环境复杂,需要多个子网,可以考虑open vswitch或者fannel
weave的总体网络性能表现欠佳, flannel VXLAN 能满足要求,一般推荐用flannel

 

 

二、Flannel部署(node01 10.192.27.115  node02 10.192.27.116)

1. 写入分配的子网段到etcd,供flanneld使用

#任意一个ETCD 节点上(这里选择在master节点上) 写入数据库:key为  /coreos.com/network/config  ,value网段信息(网段信息为:{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}})

[root@master01 ~]# cd /root/k8s/etcd-cert/
[root@master01 etcd-cert]# ls
ca-config.json  ca.csr  ca-csr.json  ca-key.pem  ca.pem  etcd-cert.sh  server.csr  server-csr.json  server-key.pem  server.pem
#设置一个键值对
[root@master01 etcd-cert]# /opt/etcd/bin/etcdctl --ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem --endpoints="https://10.192.27.100:2379,https://10.192.27.115:2379,https://10.192.27.116:2379" set /coreos.com/network/config '{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}}'
#已经设置成功:配置一个覆盖网络使用172.17.0.0/16段和VxLan转发  { "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}}
#  (默认的节点间数据通信方式是UDP转发;  flannel默认使用8285端口作为UDP封装报文的端口,VxLan使用8472端口)

#获取一个键值对
[root@master01 etcd-cert]# /opt/etcd/bin/etcdctl --ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem --endpoints="https://10.192.27.100:2379,https://10.192.27.115:2379,https://10.192.27.116:2379" get /coreos.com/network/config 
{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}}
{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}} #获取  key: /coreos.com/network/config    value:{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}}

 

 2. 下载二进制包 https://github.com/coreos/flannel/releases

[root@master01 etcd-cert]# cd ..
[root@master01 k8s]# wget https://github.com/coreos/flannel/releases/download/v0.10.0/flannel-v0.10.0-linux-amd64.tar.gz
[root@master01 k8s]# scp flannel-v0.10.0-linux-amd64.tar.gz root@10.192.27.115:~  #传至node01
[root@master01 k8s]# scp flannel-v0.10.0-linux-amd64.tar.gz root@10.192.27.116:~  #传至node02

 

 3. 两个节点上安装docker

# 安装依赖包  官方文档:https://docs.docker.com
yum install -y yum-utils device-mapper-persistent-data lvm2

# 添加Docker软件包源
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

# 安装Docker CE
yum install -y docker-ce

# 启动Docker服务并设置开机启动
systemctl start docker
systemctl enable docker

镜像从哪里来?
Docker Hub是由Docker公司负责维护的公共注册中心,包含大量的容器镜像,Docker工具默认从这个公共镜像库下载镜像。
地址:https://hub.docker.com/explore

配置镜像加速器:https://www.daocloud.io/mirror  #下载镜像时会加速
curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://f1361db2.m.daocloud.io
有互联网情况使用YUM安装

二进制包安装   #这里使用二进制包安装

[root@node01 ~]# wget https://download.docker.com/linux/static/stable/x86_64/docker-18.09.4-ce.tgz
[root@node01 ~]# tar -xf docker-18.09.4.tgz 
[root@node01 ~]# ls docker
containerd containerd-shim ctr docker dockerd docker-init docker-proxy runc
[root@node01 ~]# cp docker/* /usr/bin/
[root@node01 ~]#
[root@node02 ~]# tar -xf docker-18.09.4.tgz 
[root@node02 ~]# cp docker/* /usr/bin/
[root@node02 ~]# 

 

4. 部署与配置Flannel

两个节点都要的操作(node01 10.192.27.115 node02 10.192.27.116)
#如果是多个node 可以安装一台 把/opt/kubernetes 和flannel、docker服务启动的文件 考过去

[root@node01 ~]# tar -xf flannel-v0.10.0-linux-amd64.tar.gz 
[root@node01 ~]# ls
anaconda-ks.cfg flanneld flannel-v0.10.0-linux-amd64.tar.gz mk-docker-opts.sh README.md
[root@node01 ~]# mkdir -p /opt/kubernetes/{cfg,bin,ssl}
[root@node01 ~]# mv flanneld mk-docker-opts.sh /opt/kubernetes/bin

 编辑 flannel.sh 脚本  用生成 flannel的配置文件 、flanneld服务启动的文件和dockerd服务启动文件

#!/bin/bash

ETCD_ENDPOINTS=${1:-"http://127.0.0.1:2379"}

cat <<EOF >/opt/kubernetes/cfg/flanneld
FLANNEL_OPTIONS="--etcd-endpoints=${ETCD_ENDPOINTS} \
-etcd-cafile=/opt/etcd/ssl/ca.pem \
-etcd-certfile=/opt/etcd/ssl/server.pem \
-etcd-keyfile=/opt/etcd/ssl/server-key.pem"

EOF

cat <<EOF >/usr/lib/systemd/system/flanneld.service
[Unit]
Description=Flanneld overlay address etcd agent
After=network-online.target network.target
Before=docker.service

[Service]
Type=notify
EnvironmentFile=/opt/kubernetes/cfg/flanneld
ExecStart=/opt/kubernetes/bin/flanneld --ip-masq \$FLANNEL_OPTIONS
ExecStartPost=/opt/kubernetes/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/subnet.env
Restart=on-failure

[Install]
WantedBy=multi-user.target

EOF

cat <<EOF >/usr/lib/systemd/system/dockerd.service

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/run/flannel/subnet.env
ExecStart=/usr/bin/dockerd \$DOCKER_NETWORK_OPTIONS
ExecReload=/bin/kill -s HUP \$MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

EOF

systemctl daemon-reload
systemctl enable flanneld
systemctl restart flanneld
systemctl enable dockerd
systemctl restart dockerd
[root@node01 ~]# vim flannel.sh

 

执行脚本
[root@node01 ~]# bash flannel.sh https://10.192.27.100:2379,https://10.192.27.115:2379,https://10.192.27.116:2379

 

查看进程

[root@node01 ~]# ps -ef | grep flannel   #flannel是需要双向认证  客户端:node01的flannel服务 与 服务器端:etcd服务
root      28574      1  0 11:15 ?        00:00:07 /opt/kubernetes/bin/flanneld --ip-masq --etcd-endpoints=https://10.192.27.100:2379,https://10.192.27.115:2379,https://10.192.27.116:2379 -etcd-cafile=/opt/etcd/ssl/ca.pem -etcd-certfile=/opt/etcd/ssl/server.pem -etcd-keyfile=/opt/etcd/ssl/server-key.pem
root      39802  18416  0 13:44 pts/0    00:00:00 grep --color=auto flannel
[root@node01 ~]# 

 

5.文件copy到其它node上

[root@node01 ~]# scp -r /opt/kubernetes root@10.192.27.116:/opt
[root@node01 ~]# scp /usr/lib/systemd/system/dockerd.service root@10.192.27.116:/usr/lib/systemd/system
[root@node01 ~]# scp /usr/lib/systemd/system/flanneld.service root@10.192.27.116:/usr/lib/systemd/system
[root@node02 ~]# systemctl daemon-reload
[root@node02 ~]# systemctl enable flanneld
Created symlink from /etc/systemd/system/multi-user.target.wants/flanneld.service to /usr/lib/systemd/system/flanneld.service.
[root@node02 ~]# systemctl restart flanneld
[root@node02 ~]# systemctl enable dockerd
Created symlink from /etc/systemd/system/multi-user.target.wants/dockerd.service to /usr/lib/systemd/system/dockerd.service.
[root@node02 ~]# systemctl restart dockerd

 

[root@node02 ~]# ps -ef | grep docker
root     22818     1  0 14:52 ?        00:00:00 /usr/bin/dockerd --bip=172.17.46.1/24 --ip-masq=false --mtu=1450
root     22826 22818  1 14:52 ?        00:00:00 containerd --config /var/run/docker/containerd/containerd.toml --log-level info
root     23012 13951  0 14:52 pts/0    00:00:00 grep --color=auto docker
[root@node02 ~]# ps -ef | grep flannel
root     22672     1  0 14:52 ?        00:00:00 /opt/kubernetes/bin/flanneld --ip-masq --etcd-endpoints=https://10.192.27.100:2379,https://10.192.27.115:2379,https://10.192.27.116:2379 -etcd-cafile=/opt/etcd/ssl/ca.pem -etcd-certfile=/opt/etcd/ssl/server.pem -etcd-keyfile=/opt/etcd/ssl/server-key.pem
root     23038 13951  0 14:53 pts/0    00:00:00 grep --color=auto flannel

 

6.配置文件解析

启动服务时会生成flannel运行的环境变量

[root@node01 ~]# cat /run/flannel/subnet.env 
DOCKER_OPT_BIP="--bip=172.17.43.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=false"
DOCKER_OPT_MTU="--mtu=1450"
DOCKER_NETWORK_OPTIONS=" --bip=172.17.43.1/24 --ip-masq=false --mtu=1450"
[root@node02 ~]# cat /run/flannel/subnet.env
DOCKER_OPT_BIP="--bip=172.17.46.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=false"
DOCKER_OPT_MTU="--mtu=1450"
DOCKER_NETWORK_OPTIONS=" --bip=172.17.46.1/24 --ip-masq=false --mtu=1450"

 

 

 #修改docker的服务启动配置   配置Docker使用Flannel生成的子网和引用变量参数

[root@node01 ~]# grep -v '^#' /usr/lib/systemd/system/dockerd.service 

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/run/flannel/subnet.env             #多了一行环境变量
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS    #以flannel网络启动
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

[root@node01 ~]# 
[root@localhost system]# cat docker.bak 
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target
yum 安装的docker启动文件

 

 7. 查看网络状态

[root@node01 ~]# ifconfig 
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.43.1  netmask 255.255.255.0  broadcast 172.17.43.255
        ether 02:42:96:a2:41:c6  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.192.27.115  netmask 255.255.255.128  broadcast 10.192.27.127
        inet6 fe80::444d:ef36:fd70:9a89  prefixlen 64  scopeid 0x20<link>
        ether 80:18:44:e6:eb:dc  txqueuelen 1000  (Ethernet)
        RX packets 3905052  bytes 633862527 (604.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3275346  bytes 515290623 (491.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 81  

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450   #由于em1 所以为 flannel.1   如果网卡为eth0  flannel0
        inet 172.17.43.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::342a:5aff:feb1:ec27  prefixlen 64  scopeid 0x20<link>
        ether 36:2a:5a:b1:ec:27  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 12096  bytes 689540 (673.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12096  bytes 689540 (673.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

[root@node02 ~]# ifconfig 
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.46.1  netmask 255.255.255.0  broadcast 172.17.46.255
        ether 02:42:8f:3e:f5:65  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.192.27.116  netmask 255.255.255.128  broadcast 10.192.27.127
        inet6 fe80::fde1:f746:6309:54a2  prefixlen 64  scopeid 0x20<link>
        ether 50:9a:4c:77:36:c5  txqueuelen 1000  (Ethernet)
        RX packets 5753325  bytes 888281290 (847.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5123425  bytes 644662134 (614.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 172.17.46.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::fc48:3dff:fe42:ab6a  prefixlen 64  scopeid 0x20<link>
        ether fe:48:3d:42:ab:6a  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 11668  bytes 614833 (600.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11668  bytes 614833 (600.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

[root@node01 ~]# ping 172.17.46.1
PING 172.17.46.1 (172.17.46.1) 56(84) bytes of data.
64 bytes from 172.17.46.1: icmp_seq=1 ttl=64 time=0.272 ms
64 bytes from 172.17.46.1: icmp_seq=2 ttl=64 time=0.182 ms
64 bytes from 172.17.46.1: icmp_seq=3 ttl=64 time=0.182 ms
[root@node02 ~]# ping 172.17.43.1
PING 172.17.43.1 (172.17.43.1) 56(84) bytes of data.
64 bytes from 172.17.43.1: icmp_seq=1 ttl=64 time=0.264 ms
64 bytes from 172.17.43.1: icmp_seq=2 ttl=64 time=0.213 ms
64 bytes from 172.17.43.1: icmp_seq=3 ttl=64 time=0.216 ms

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

posted @ 2019-11-08 14:00  冥想心灵  阅读(159)  评论(0编辑  收藏  举报