Designing Scalable, Portable Docker Container Networks
Designing Scalable, Portable Docker Container Networks
- Docker host Network Driver
- Docker Bridge Network Driver
- User-Defined Bridge Networks
- External Access for Standalone Containers
- Overlay Driver Network Architecture
- Overlay Driver Internal Architecture
- External Access for Docker Services
- MACVLAN
- VLAN Trunking with MACVLAN
- Swarm Native Service Discovery
- Docker Native Load Balancing
- UCP Internal Load Balancing
- UCP External L4 Load Balancing (Docker Routing Mesh)
- Bridge Driver on a Single Host
Docker host Network Driver
#Create containers on the host network
$ docker run -itd --net host --name C1 alpine sh
$ docker run -itd --net host --name nginx
#Show host eth0
$ ip add | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
inet 172.31.21.213/20 brd 172.31.31.255 scope global eth0
#Show eth0 from C1
$ docker run -it --net host --name C1 alpine ip add | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP qlen 1000
inet 172.31.21.213/20 brd 172.31.31.255 scope global eth0
#Contact the nginx container through localhost on C1
$ curl localhost
!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
Docker Bridge Network Driver
- bridge is the name of the Docker network
- bridge is the network driver, or template, from which this network is created
- docker0 is the name of the Linux bridge that is the kernel building block used toimplement this network
User-Defined Bridge Networks
$ docker network create -d bridge --subnet 10.0.0.0/24 my_bridge
$ docker run -itd --name c2 --net my_bridge busybox sh
$ docker run -itd --name c3 --net my_bridge --ip 10.0.0.254 busybox sh
$ brctl show
bridge name bridge id STP enabled interfaces
br-b5db4578d8c9 8000.02428d936bb1 no vethc9b3282
vethf3ba8b5
docker0 8000.0242504b5200 no vethb64e8b8
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
b5db4578d8c9 my_bridge bridge local
e1cac9da3116 bridge bridge local
...
External Access for Standalone Containers
Overlay Driver Network Architecture
The native Docker overlay network driver radically simplifies many of the challenges in multi-host networking. With the overlay driver, multi-host networks are first-class citizens inside Docker without external provisioning or components. overlay uses the Swarm-distributed control plane to provide centralized management, stability, and security across very large scale clusters.
In this diagram, the packet flow on an overlay network is shown. Here are the steps that take place when c1 sends c2 packets across their shared overlay network:
c1
does a DNS lookup forc2
.Since both containers are on the same overlay network the Docker Engine local DNS server resolvesc2
to its overlay IP address10.0.0.3
- An overlay network is a L2 segment so
c1
generates an L2 frame destined for the MAC address ofc2
. - The frame is encapsulated with a VXLAN header by the overlay network driver. The distributed overlay control plane manages the locations and state of each VXLAN tunnel endpoint so it knows that
c2
resides on host-B at the physical address of192.168.0.3
. That address becomes the destination address of the underlay IP header. - Once encapsulated the packet is sent. The physical network is responsible of routing or bridging the VXLAN packet to the correct host.
- The packet arrives at the
eth0
interface ofhost-B
and is decapsulated by theoverlay
network driver. The original L2 frame fromc1
is passed toc2
's eth0 interface and up to the listening application.
Overlay Driver Internal Architecture
#Create an overlay named "ovnet" with the overlay driver
$ docker network create -d overlay --subnet 10.1.0.0/24 ovnet
#Create a service from an nginx image and connect it to the "ovnet" overlay network
$ docker service create --network ovnet nginx
When the overlay network is created, notice that several interfaces and bridges are created inside the host as well as two interfaces inside this container.
# Peek into the container of this service to see its internal interfaces
conatiner$ ip address
#docker_gwbridge network
52: eth0@if55: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
link/ether 02:42:ac:14:00:06 brd ff:ff:ff:ff:ff:ff
inet 172.20.0.6/16 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe14:6/64 scope link
valid_lft forever preferred_lft forever
#overlay network interface
54: eth1@if53: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450
link/ether 02:42:0a:01:00:03 brd ff:ff:ff:ff:ff:ff
inet 10.1.0.3/24 scope global eth0
valid_lft forever preferred_lft forever
inet 10.1.0.2/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:aff:fe01:3/64 scope link
valid_lft forever preferred_lft forever
External Access for Docker Services
- ingress mode service publish
$ docker service create --replicas 2 --publish mode=ingress,target=80,published=8080 nginx
mode=ingress is the default mode for services. This command can also be accomplished with the shorthand version -p 80:8080. Port 8080 is exposed on every host on the cluster and load balanced to the two containers in this service.
- host mode service publish
$ docker service create --replicas 2 --publish mode=host,target=80,published=8080 nginx
host mode requires the mode=host flag. It publishes port 8080 locally on the hosts where these two containers are running. It does not apply load balancing, so traffic to those nodes are directed only to the local container. This can cause port collision if there are not enough ports available for the number of replicas.
- ingress design
MACVLAN
#Creation of MACVLAN network "mvnet" bound to eth0 on the host
$ docker network create -d macvlan --subnet 192.168.0.0/24 --gateway 192.168.0.1 -o parent=eth0 mvnet
#Creation of containers on the "mvnet" network
$ docker run -itd --name c1 --net mvnet --ip 192.168.0.3 busybox sh
$ docker run -it --name c2 --net mvnet --ip 192.168.0.4 busybox sh
/ # ping 192.168.0.3
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.052 ms
As you can see in this diagram, c1
and c2
are attached via the MACVLAN network called macvlan
attached to eth0
on the host.
VLAN Trunking with MACVLAN
#Creation of macvlan10 network in VLAN 10
$ docker network create -d macvlan --subnet 192.168.10.0/24 --gateway 192.168.10.1 -o parent=eth0.10 macvlan10
#Creation of macvlan20 network in VLAN 20
$ docker network create -d macvlan --subnet 192.168.20.0/24 --gateway 192.168.20.1 -o parent=eth0.20 macvlan20
#Creation of containers on separate MACVLAN networks
$ docker run -itd --name c1--net macvlan10 --ip 192.168.10.2 busybox sh
$ docker run -it --name c2--net macvlan20 --ip 192.168.20.2 busybox sh
In the preceding configuration we've created two separate networks using the macvlan driver that are configured to use a sub-interface as their parent interface. The macvlan driver creates the sub-interfaces and connects them between the host's eth0 and the container interfaces. The host interface and upstream switch must be set to switchport mode trunk so that VLANs are tagged going across the interface. One or more containers can be connected to a given MACVLAN network to create complex network policies that are segmented via L2.
Swarm Native Service Discovery
Docker uses embedded DNS to provide service discovery for containers running on a single Docker Engine and tasks running in a Docker Swarm. Docker Engine has an internal DNS server that provides name resolution to all of the containers on the host in user-defined bridge, overlay, and MACVLAN networks. Each Docker container ( or task in Swarm mode) has a DNS resolver that forwards DNS queries to Docker Engine, which acts as a DNS server. Docker Engine then checks if the DNS query belongs to a container or service on network(s) that the requesting container belongs to. If it does, then Docker Engine looks up the IP address that matches a container, task, orservice's name in its key-value store and returns that IP or service Virtual IP (VIP) back to the requester.
Service discovery is network-scoped, meaning only containers or tasks that are on the same network can use the embedded DNS functionality. Containers not on the same network cannot resolve each other's addresses. Additionally, only the nodes that have containers or tasks on a particular network store that network's DNS entries. This promotes security and performance.
If the destination container or service does not belong on the same network(s) as the source container, then Docker Engine forwards the DNS query to the configured default DNS server.
In this example there is a service of two containers called myservice. A second service (client) exists on the same network. The client executes two curl operations for docker.com and myservice. These are the resulting actions:
- DNS queries are initiated by client for docker.com and myservice.
- The container's built-in resolver intercepts the DNS queries on 127.0.0.11:53 and sends them to Docker Engine's DNS server.
- myservice resolves to the Virtual IP (VIP) of that service which is internally load balanced to the individual task IP addresses. Container names resolve as well, albeit directly to their IP addresses.
- docker.com does not exist as a service name in the mynet network and so the request is forwarded to the configured default DNS server.
Docker Native Load Balancing
Internal load balancing is instantiated automatically when Docker services are created. When services are created in a Docker Swarm cluster, they are automatically assigned a Virtual IP (VIP) that is part of the service's network. The VIP is returned when resolving the service's name. Traffic to that VIP is automatically sent to all healthy tasks of that service across the overlay network. This approach avoids any client-side load balancing because only a single IP is returned to the client. Docker takes care of routing and equally distributing the traffic across the healthy service tasks.
UCP Internal Load Balancing
to see the VIP,run a docker service inspect my_service
as follows:
# Create an overlay network called mynet
$ docker network create -d overlay mynet
a59umzkdj2r0ua7x8jxd84dhr
# Create myservice with 2 replicas as part of that network
$ docker service create --network mynet --name myservice --replicas 2 busybox ping localhost
8t5r8cr0f0h6k2c3k7ih4l6f5
# See the VIP that was created for that service
$ docker service inspect myservice
...
"VirtualIPs": [
{
"NetworkID": "a59umzkdj2r0ua7x8jxd84dhr",
"Addr": "10.0.0.3/24"
},
]
UCP External L4 Load Balancing (Docker Routing Mesh)
This diagram illustrates how the Routing Mesh works.
- A service is created with two replicas, and it is port mapped externally to port 8000.
- The routing mesh exposes port 8000 on each host in the cluster.
- Traffic destined for the app can enter on any host. In this case the external LB sends the traffic to a host without a service replica.
- The kernel's IPVS load balancer redirects traffic on the ingress overlay network to a healthy service replica.
Bridge Driver on a Single Host
$ docker network create -d bridge petsBridge
$ docker run -d --net petsBridge --name db consul
$ docker run -it --env "DB=db" --net petsBridge --name web -p 8000:5000 chrch/docker-pets:1.0
Starting web container e750c649a6b5
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
When an IP address is not specified, port mapping is exposed on all interfaces of a host. In this case the container's application is exposed on
0.0.0.0:8000
. To provide a specific IP address to advertise on use the flag-p IP:host_port:container_port
.
The application is exposed locally on this host on port 8000 on all of its interfaces. Also supplied is DB=db, providing the name of the backend container. The Docker Engine's built-in DNS resolves this container name to the IP address of db. Since bridge is a local driver, the scope of DNS resolution is only on a single host.
The output below shows us that our containers have been assigned private IPs from the 172.19.0.0/24 IP space of the petsBridge network. Docker uses the built-in IPAM driver to provide an IP from the appropriate subnet if no other IPAM driver is specified.
$ docker inspect --format {{.NetworkSettings.Networks.petsBridge.IPAddress}} web
172.19.0.3
$ docker inspect --format {{.NetworkSettings.Networks.petsBridge.IPAddress}} db
172.19.0.2