Linux Container From Scratch

See more container information

Minimal Implementation of a Container

Note the environement is Ubuntu16
Install useful tools:

sudo apt-get install vim screen lftp busybox-static systemd systemd-container yum qemu-utils aufs-tools pbzip2 htop

create dir named nimi_container and a file named yum.conf in this dir. Under nimi_container, do 

mkdir -p {minimal,minimal/usr}/{bin,sbin,etc}

this creates thus dir tree shown in the following figures

Then we need busybox which ensembles over 100 most commonly used Linux comands and tools such as ls, cat and echo. To check what tools it contains, do busybox --list-full. So we are going to have all the sybolic links in busybox created in the container:

So check what happens in minimal/bin/:

and copy the busybox binary:

cp -f /bin/busybox minimal/bin/sh

here -f means "force copy by removing the destination file if needed". Then

touch minimal/etc/os-release

Now let's get into busybox

sudo chroot minimal /bin/sh

It shows the following which means we got into busybox.

In the busybox, we can do any similar like in the terminal.

Is this a linux container?  Well there is no objective standard to judge that, so you can argue "sure, it is a container!" . If we look at the stuff like 

it shows that we have no /proc here so we won't see any process namespace at all. But in terminal, it shows like the following. So we can say it is sort of like container.

Then why is this? see from chroot to systemd-nspawn which explains why systemd-nspawn is better than chroot.  One thing that is worth mentioning is "chroot won't mount /proc and /sys but systemd-nspawn can". Do 

systemd-nspawn -Dminimal /bin/sh
#-Dminimal the dir we wanna use
# /bin/sh the process we wanna launch

and we enter busybox again. Do ps ax then we see

If we check the network using ip a in chroot or systemd-nspawn, we actually see the system networking here.

But for a container, we need its own private network, so we add the network, do

systemd-nspawn --private-network -Dminimal /bin/sh

and we check the network, we can see that we only have the loopback. 

That is the minimal implementation of a container we made!   Now check what we have here

Now sum how we did this, see the following steps:

#sudo apt-get install vim screen lftp busybox-static systemd systemd-container yum qemu-utils aufs-tools pbzip2 htop

#mkdir -p {minimal,minimal/usr}/{bin,sbin,etc}

#for x in $(busybox --list-full); do
>ln -s /bin/sh minimal/$x;done

#cp -f /bin/busybox minimal/bin/sh

#touch nimimal/etc/os-release

And through the above experience, we list how we run the container:

We can see we's better use systemd-nspawn instead of chroot. Last, if check ps ax in the terminal(not in busybox), we can see 

They are 2 process in system level, but in container, /bin/sh's PID is 1, the exact same process. To see the process trees, use

pstree

Build a Container Image with cpio

What we did in the above is not portable, so let's do it. Here cpio is our choice while you can use some other similar tools. cpio is a general file archiver utility and copies files into or out of a cpio or tar archive. First, do

find minimal -print | cpio -o | pbzip2 -c > minimal.cpio.bz2
#the input of cpio is the result of find
# -o means the mode of output
# pbzip2 parallel zip tool ; -c means output to stdout

The reason why using pbzip2 is just that we wanna zip the output of cpio to save storage. See the reault

So this file is the image of a container, you can ship around or put an app in it. That's the minimal container you could do.

Limiting CPU Access with cgroups

Cgroups is the base of implementing resource management of IaaS virtualization(such as kvm, lxc),PaaS Container(such as Docker) . Compare popular resource isolation techs.

Before we create a cgroup, we first check the time of a pbzip2 cmd. First, lets create a datafile needed for pbzip2:

dd if=/dev/urandom of=datafile bs=1M count=100
#randomly generate numbers, into datafile which has 100 blocks, each block is 1M
#if input file;  of  output file
#bs byte size

Then check how much time it costs:

time pbzip2 -k -9 datafile
#time: check how much time the cmd costs
#-k: keep the original file
#-9: 设置BWT预处理块大小,单位100k,1压缩速度最快,但是压缩率最低。默认900k

Now lets create a cgroup.

sudo mkdir /sys/fs/cgroup/cpuset/my_cpuset

What is interesting is that after you created the dir my_cpuset, it will automatically show several files in the dir. See

Then limiting resources for the cgroup. For CPU cgroup, there are 2 minimal required arguments: the CPU cores that you wanna access and mem nodes.

Now we've got a cgroup, but there is no task in it. We can add the current shell to the cgroup, like this

echo $$ | sudo tee /sys/fs/cgroup/cpuset/my_cpuset/tasks
#echo $$ see the process id of this shell
#tee 命令很好用,它从管道接受信息,一边向屏幕输出,一边向文件写入。

Then re-run  time pbzip2 -k -9 datafile and one can see the time nearly doubles.

Connect a Container to the Network

In the above shows when we run a container with systemd-nspawn --private-network -Dminimal /bin/sh, the network only has a loopback, there is no way connecting to the world outside. Lets first manage network spaces. Do

ip netns list
#netns means network namespace

It shows nothing, but it's not quite true because you actually have network namespaces but they are just unnamed. So what we do is to create one.

sudo ip netns add minimal #create a network namespace named minimal

Now add a virtualied ethernet cable.

Now move the created ethernet into the namespace:

 We can see eth1 has gone into network namespace minimal, because any etherized device can only belong to one network namespace. Now bring the veth1 up.

One feature of the ip cmd is to let you execute a process within a namespace. Here the process is chroot. 

Now we are in the busybox, in the network namespace minimal, in the chroot namespace. Now we give eth1 an address and bring it up:

At the hosting side, eth1 corresponds to veth1. In order to connect with eth1 in the container, we should add an address to veth1(we don't need to bring veth1 up because it has been brought up, see above). And ping eth1, it works!

Demo: Splitting a Container Image into Layers with aufs

 

posted on 2016-11-09 23:45  chaseblack  阅读(332)  评论(0编辑  收藏  举报

导航