OverlayFS Storage driver
简介
OverlayFS是Docker公司新型联合文件系统,类似于AUFS,理论上性能更好,实现更简单,Docker为OverlayFS提供了二种storage driver,如下
-
overlay
-
overlya2
第二代overlay2相比第一代更稳定,实践证明overlay2性能更强,更稳定,主要表现在inode的利用效率更高,选择overlay2前提条件系统内核必须在4.0以上或者更高
Prerequesites
-
Linux内核版本高于4.0;RHEL&CentOS的内核版本3.10.0-514 或者更高,如果使用的是相对老的内核版本推荐使用overlay驱动
-
overlay & overlay2都支持XFS文件系
XFS (RHEL7.2以后,推荐使用XFS,但是你必须在格式化的时候需要调整一个参数d_type=true,具体执行命令为mkfs.xfs -n ftype=1 注意:如果你在格式化XFS没有使用ftype参数时,虽然可以正常使用overlay&overlay2,但是这是一个致使性错误,在迁移数据时,新节点为有可能报错,数据错误等情况)
使用xfs_info 校验文件系统参数的值(Use xfs_info to verify that the ftype option is set to 1. To format an xfs filesystem correctly, use the flag -n ftype=1)
配置overlya2驱动
-
停止Docker
systemctl stop docker
-
备份Docker主目录内容(/var/lib/docker)
-
编辑/etc/docker/daemon.json
{ "exec-opts": ["native.cgroupdriver=systemd"], "data-root": "/data/docker", "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file":"5" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] }
-
启动Docker,并使用docker info验证
Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: systemd
-
现在已经把Docker存储驱动切换为overlay2,并且会自动创建overlay挂载(lowerdir upperdir merged workdir)
overlay2 driver工作原理
使用docker info可获得Docker的存储目录,默认是/var/lib/docker/overlay2,在这个目录下的每个目录都是一个层
OverlayFS目录引用说明
-
lower directory使用lowerdir引用
-
upper directory使用upperdir引用,(个人理解这一层也是container layer)
-
所在目录如果有merged目录,通常这一层称之这联合挂载点 (union mount)
-
work目录是OverlayFS内部使用
具体说明,如下
l (小L)目录解释
-
切换至/var/lib/docker/overlay2目录下
<root@SIT-K8S-WN6 /data/docker/overlay2># ls由于主机上的container过多,删除了一部分 74dd890cbb4f6d8ef246d6a5412683d1badd7baf08c6480ffb00f76ccb6736ef f6d6b3abe78f3f423b9e9daca0f690976204765cbdf56b913d40ffb9c7c2f54c 74dd890cbb4f6d8ef246d6a5412683d1badd7baf08c6480ffb00f76ccb6736ef-init f6d6b3abe78f3f423b9e9daca0f690976204765cbdf56b913d40ffb9c7c2f54c-init 75a104cb36ca4b2ec6b0e20387f6e9b2efb51b4adfd4b8d8483d646570cadaca f84296e9ad97ead192edcf85a534eddbdf64a8f741b748339be139e552e2324b 76a8ef47902b202eb56211bdc79eec37dd39fec0210d5133f56b6cfeaa7ecf3a f9c126e10546d040520d0041c785198c2781a0312d2f8d31d71221c3fb1f2a6e 772e9c2962b2730e7de5f1cafd10f9b653d5549508e0b2a6c0a393a42e5754cc f9c4ff2152ff5c0580acc0433a2205c6dd46e1d727696c2ade70483e958e3542 779bafc49297a369419982d3acc91b978fe599aecc328929e8c3f03e79d1eef5 fcdcc750ce3e322ebf1eb013e0ea1060ab2ee4d34ddbb4759b99e81ccfbf1dca 77a2beef60b5bfaa203ac9b7061aeea1d5104f1da8652767a62456c82aa3c964 fce9f9bb2f0a0f6649c80698b464b885785a6cdddd1cb13de5ad7d775356c1a4 78b2c508c29804b7c036b434d548a277e1f0f9afe3eae1eae5af8c915283d90b fd60726f922275cd42c453e65100d1ca8b83bc1aceb1dc1dd6f8b59e09a835cd 793d844a1c2adb58fdd2f33c2bf2b4a03ccee4fd7cd84d1f344b5e1c62b95a1f l /////// notice 小写L //////
-
有一个叫 l 的目录(小写L)其实link的缩写,其实是/var/lib/docker/overlay2目录下所有目录缩略层表示方式符号链接,每一个符号链接与一个"目录/diff" 对应,其目的功能是避免在使用mount挂载时遇到参数值长度限制
<root@SIT-K8S-WN6 /data/docker/overlay2/l># ls -l lrwxrwxrwx 1 root root 77 May 6 08:57 3UOYKR5WJV6ZE7ZDY7E6VR6NXS -> ../3ddd2c9467d87bcaccd855f2f7003fa57349caa393d8f0eab3d0c960f17ff8df-init/diff lrwxrwxrwx 1 root root 72 May 6 14:15 3V3D3T4RHI3RIYDOG5KC4CGQWD -> ../f2875225d8ee6690ae59588f62407d1f184c04bcd34fbb463540555ace7bbfb5/diff lrwxrwxrwx 1 root root 72 Apr 27 17:03 3XVPBZ27LCSWEIQ2K7SJK7HJJJ -> ../af018bb6c1cf039d91f8ee90c29720c671489cd2bedc09e6721f0adb31c99756/diff lrwxrwxrwx 1 root root 72 May 6 11:43 42GODJJRZH3ZIMF3TS74GCDVZN -> ../83f9b6609d10187d07052ec21119b5a4584cf1f65a73b85fde1f3fa516bb2b0c/diff lrwxrwxrwx 1 root root 72 Apr 26 09:48 4A2OZFV3OKB2EPF7IP2RCJ5FX2 -> ../c7adba9472c58dfd1957adeabd0d81adcd6193da9b5e49391c8193c0cf68e5cd/diff lrwxrwxrwx 1 root root 72 Apr 18 11:08 4BHK6DGCJ532RDYYSP7F6DRH63 -> ../5444dc6988761a68f9da8c19d9a8037b32dcb7dcd3abdc05bf16f47bce7931ee/diff lrwxrwxrwx 1 root root 72 May 19 19:56 4C5YL4YA6LTHUZ5GM7KSI4HH7X -> ../a206024448e8ab31ef691de53da8ca2696c35ec800f87ab49b895a62f57a321c/diff lrwxrwxrwx 1 root root 72 Apr 25 19:08 4DQNTTP7G4AIPMLPJXEMWN3ETJ -> ../11b895e0fa767ee0907cbfa8c4829ab9b99bb7a7790c6348485a1f1a6169d0a9/diff lrwxrwxrwx 1 root root 72 Dec 6 15:41 4HDN4C3GOJWCY3OKQJHO6G6FYC -> ../d8a9a18ed72ef0b459b197c030f38ef7f0fa149285f27893c6b4e33bbb130b99/diff lrwxrwxrwx 1 root root 72 Aug 12 2021 4JTPAEMJA3YIGPIML5G33NWSLC -> ../90d8ae07edb928602971311327fc74a75ae9ef3d5f3133816ebc45ddb4f9ce92/diff lrwxrwxrwx 1 root root 72 Mar 8 11:34 4LZSRRXKSIMJE23K3UJXD2JUKC -> ../f6d6b3abe78f3f423b9e9daca0f690976204765cbdf56b913d40ffb9c7c2f54c/diff lrwxrwxrwx 1 root root 72 Jul 12 2021 4ONQGUHN7ZYTXEQUUJIS5A6LAU -> ../f31115fdaf5846157952898bc420494db3d3c8dfafaad6d56131242bb328ca6e/diff
lowest layer & second-lowest layer说明
-
lowest layer 称之为镜像基础层
-
second-lowest layer 称之为镜像中间层
二者唯一区别在于second-lowest layer镜像中间层存在lower文件,该文件记录了本层所使用的lowest layer(可以叫本层的父层),详细说明如下
-
使用docker inspect container_id,如下
inspec列出的内容已经很明确了当前容器使用的“层级目录”(镜像层&容器层)
docker inspect b4904d66e772 [ { "Id": "b4904d66e772a06fc899d29caee680a6fea89d01efa6ff058e760867bfdd6fa5", "Created": "2022-05-19T07:41:36.998952739Z", "Path": "start.sh", "Args": [], "State": { "Status": "running", "Running": true, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 18577, "ExitCode": 0, "Error": "", "StartedAt": "2022-05-19T07:41:38.207824094Z", "FinishedAt": "0001-01-01T00:00:00Z" }, "Image": "sha256:a09ade240dd3469873c80e3f65e933e1e55e8520b1401608df11fc3b1cd44e52", "ResolvConfPath": "/data/docker/containers/63b616968918e3cc68959b5294deb3be9bdd59d76e6fd802ea939d58230cdfc2/resolv.conf", "HostnamePath": "/data/docker/containers/63b616968918e3cc68959b5294deb3be9bdd59d76e6fd802ea939d58230cdfc2/hostname", "HostsPath": "/var/lib/kubelet/pods/f7fad1ae-526b-4183-9729-7ec40f1ca7b2/etc-hosts", "LogPath": "/data/docker/containers/b4904d66e772a06fc899d29caee680a6fea89d01efa6ff058e760867bfdd6fa5/b4904d66e772a06fc899d29caee680a6fea89d01efa6ff058e760867bfdd6fa5-json.log", "Name": "/k8s_sit01-ytx-service_sit01-ytx-service-8cf48ff8f-swknx_sit_f7fad1ae-526b-4183-9729-7ec40f1ca7b2_0", "RestartCount": 0, "Driver": "overlay2", "Platform": "linux", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "", "ExecIDs": null, "HostConfig": { "Binds": [ "/home/nflow/logs/sit01:/home/nflow/logs", "/var/lib/kubelet/pods/f7fad1ae-526b-4183-9729-7ec40f1ca7b2/volumes/kubernetes.io~configmap/apolloproperties:/opt/settings:ro", "/var/lib/kubelet/pods/f7fad1ae-526b-4183-9729-7ec40f1ca7b2/volumes/kubernetes.io~secret/default-token-rpn94:/var/run/secrets/kubernetes.io/serviceaccount:ro", "/var/lib/kubelet/pods/f7fad1ae-526b-4183-9729-7ec40f1ca7b2/etc-hosts:/etc/hosts", "/var/lib/kubelet/pods/f7fad1ae-526b-4183-9729-7ec40f1ca7b2/containers/sit01-ytx-service/7cd4e2ab:/dev/termination-log" ], "ContainerIDFile": "", "LogConfig": { "Type": "json-file", "Config": { "max-file": "5", "max-size": "10m" } }, "NetworkMode": "container:63b616968918e3cc68959b5294deb3be9bdd59d76e6fd802ea939d58230cdfc2", "PortBindings": null, "RestartPolicy": { "Name": "no", "MaximumRetryCount": 0 }, "AutoRemove": false, "VolumeDriver": "", "VolumesFrom": null, "CapAdd": null, "CapDrop": null, "Capabilities": null, "Dns": null, "DnsOptions": null, "DnsSearch": null, "ExtraHosts": null, "GroupAdd": null, "IpcMode": "container:63b616968918e3cc68959b5294deb3be9bdd59d76e6fd802ea939d58230cdfc2", "Cgroup": "", "Links": null, "OomScoreAdj": 984, "PidMode": "", "Privileged": false, "PublishAllPorts": false, "ReadonlyRootfs": false, "SecurityOpt": [ "seccomp=unconfined" ], "UTSMode": "", "UsernsMode": "", "ShmSize": 67108864, "Runtime": "runc", "ConsoleSize": [ 0, 0 ], "Isolation": "", "CpuShares": 102, "Memory": 1073741824, "NanoCpus": 0, "CgroupParent": "kubepods-burstable-podf7fad1ae_526b_4183_9729_7ec40f1ca7b2.slice", "BlkioWeight": 0, "BlkioWeightDevice": null, "BlkioDeviceReadBps": null, "BlkioDeviceWriteBps": null, "BlkioDeviceReadIOps": null, "BlkioDeviceWriteIOps": null, "CpuPeriod": 100000, "CpuQuota": 100000, "CpuRealtimePeriod": 0, "CpuRealtimeRuntime": 0, "CpusetCpus": "", "CpusetMems": "", "Devices": [], "DeviceCgroupRules": null, "DeviceRequests": null, "KernelMemory": 0, "KernelMemoryTCP": 0, "MemoryReservation": 0, "MemorySwap": 1073741824, "MemorySwappiness": null, "OomKillDisable": false, "PidsLimit": null, "Ulimits": null, "CpuCount": 0, "CpuPercent": 0, "IOMaximumIOps": 0, "IOMaximumBandwidth": 0, "MaskedPaths": [ "/proc/acpi", "/proc/kcore", "/proc/keys", "/proc/latency_stats", "/proc/timer_list", "/proc/timer_stats", "/proc/sched_debug", "/proc/scsi", "/sys/firmware" ], "ReadonlyPaths": [ "/proc/asound", "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ] }, "GraphDriver": { "Data": { "LowerDir": "/data/docker/overlay2/27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280-init/diff:/data/docker/overlay2/8c1f6a8da780f946f2aadf7e85531cecb7fef0ac86c377583ae03718cd97c42b/diff:/data/docker/overlay2/7c8ae2229cf858ae635387c80c6bff50f5bcd2a4d0ec81ec1bf212b7a17e43ac/diff:/data/docker/overlay2/af018bb6c1cf039d91f8ee90c29720c671489cd2bedc09e6721f0adb31c99756/diff:/data/docker/overlay2/80b4a8ebbb6ed78bf16196a743b9d91cd4c34232f63434998306cdbe791cea8f/diff:/data/docker/overlay2/9e07099c52d0e3b06c03e7be02aeb671479cdbdf92778dccaa6efcede68bbb04/diff:/data/docker/overlay2/f32c2fbf80a2e5f3c8cb1268f8b7b7ceb4424be816b4555a32a1073ece397c63/diff:/data/docker/overlay2/a09ddfa75b58de4f7fe12f1263867e59f86a34d0d931f28237900f1da82b0ce0/diff:/data/docker/overlay2/646a21692ed0c268582958278f153ad6e040d5a01d3605036c2fe5adbf779411/diff:/data/docker/overlay2/72f081be18bfea6e18b520a7ffdd23742a041f6ddd302f46257f15bced7991fa/diff:/data/docker/overlay2/21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573/diff", "MergedDir": "/data/docker/overlay2/27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280/merged", "UpperDir": "/data/docker/overlay2/27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280/diff", "WorkDir": "/data/docker/overlay2/27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280/work" }, "Name": "overlay2" }, ...略
-
其中LowerDir代表运行的容器b4904d66e772需要引用的镜像层包括镜像最底层lowest layer,其中引用的最底层是21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573 lowest layer
切换到lowest layer 21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573目录下
<root@SIT-K8S-WN6 /data/docker/overlay2># cd 21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573 <root@SIT-K8S-WN6 /data/docker/overlay2/21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573># ls committed diff link
切换diff目录下,查看内容,其实这一层是该容器运行使用的镜像,构建该镜像第一层所有的文件内容,diff目录通常记录的本层的内容 <root@SIT-K8S-WN6 /data/docker/overlay2/21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573># cd diff/ <root@SIT-K8S-WN6 /data/docker/overlay2/21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573/diff># ls anaconda-post.log bin dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
切换至second-lowest layer 646a21692ed0c268582958278f153ad6e040d5a01d3605036c2fe5adbf779411 每一个镜像中间层都有一个lower文件,上文已经提到过,该文件记录了本层使用的父层 <root@SIT-K8S-WN6 /data/docker/overlay2/21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573/diff># cd ..<root@SIT-K8S-WN6 /data/docker/overlay2># cd 646a21692ed0c268582958278f153ad6e040d5a01d3605036c2fe5adbf779411/ <root@SIT-K8S-WN6 /data/docker/overlay2/646a21692ed0c268582958278f153ad6e040d5a01d3605036c2fe5adbf779411># ls committed diff link lower work
查看镜像中间层引用了哪些父层,如下 <root@SIT-K8S-WN6 /data/docker/overlay2/646a21692ed0c268582958278f153ad6e040d5a01d3605036c2fe5adbf779411># cat lower l/74W62ZSPYYIUJN3XXAXVKVGCGK:l/DNYAFVCQFOCDFBHLJXFPH6HB2H
查看l目录的链接关系,得出DNYAFVCQFOCDFBHLJXFPH6HB2H链接关系,正好是镜像的基层 lowest layer
<root@SIT-K8S-WN6 /data/docker/overlay2/646a21692ed0c268582958278f153ad6e040d5a01d3605036c2fe5adbf779411># ls -l ../l | grep DNYAFVCQFOCDFBHLJXFPH6HB2H lrwxrwxrwx 1 root root 72 Aug 21 2020 DNYAFVCQFOCDFBHLJXFPH6HB2H -> ../21e89a52bec93439b7b6c423b11a00fa2df0cd3aaaade821faef80c0b9071573/diff
Container layer & merged说明
inspect b4904d66e772容器解析文件,其中有一个叫 UpperDir & mergedDir 二个KEY
- 进入该目录查看,如下
<root@SIT-K8S-WN6 /data/docker/overlay2># cd 27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280<root@SIT-K8S-WN6 /data/docker/overlay2/27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280># ls -l total 20 drwxr-xr-x 5 root root 4096 May 19 15:41 diff -rw-r--r-- 1 root root 26 May 19 15:41 link -rw-r--r-- 1 root root 318 May 19 15:41 lower drwxr-xr-x 1 root root 4096 May 19 15:41 merged drwx------ 3 root root 4096 May 19 15:41 work
-
发现有个merged目录 ,通常只有container layer才有merged目录,进入merged目录查看
其实该merged目录下的内容包括了上一级目录的lower文件所有引用镜像层的内容 + 上一级目录diff目录的内容组成联合挂载,是本容器b4904d66e772运行进所使用挂载层即容器层
<root@SIT-K8S-WN6 /data/docker/overlay2/27932af0f234e90a565c5709ee9b3d3b6f79c1586622a6cf10d5e82422a0e280/merged># ls anaconda-post.log bin dev etc home lib lib64 media mnt opt proc project root run sbin srv sys tmp usr var
overlay2总结
-
lower 是记录本层引用lowest layer镜像父层
-
diff 是记录本层的内容
-
link 是记录本层的符号链接
-
merged 是由LowerDir & UpperDir 二个目录中的内容组成的容器运行时使用的统一挂载点
overlay driver目录原理
与overlay2工作类似,看下图
overlay总结
不同之处是,overlay driver明确区分了 镜像层 与 容器层
位于/var/lib/docker/overlay目录下
使用docker ps -q 得到container-id,这些以container-id打头的目录就是容器层
目录结构组成如下
文件 | lower-id | 记录本层引用的父层 |
目录 | merged | 容器运行统一挂载点 |
目录 | upper | 记录本层所使用文件内容 |
目录 | work | OverlayFS内部使用,理解为运行时发生文件修改(新增、修改、删除)时使用的临时目录 |
Overlay读写操作
Reading files
三种场景,在Overlay环境下,容器打开一个文件时是怎么读取的
-
文件不在容器层中(Container layer)
在容器运行时,container进行文件读取,但是该文件在容器(upperdir)不存在,怎么办呢?container会从本身引用的镜像层读取(lowerdir)性能损耗几乎忽略不计,简单的数据读取操作如下
open file → cant find in the container (upperdir) → read from lowerdir
-
文件在容器层中
在容器运行时,Container进行文件读取,如果该文件在容器(upperdir)中, 但是不在镜像(lowerdir)中,那么直接从容器读取
-
文件既不在容器层也不在镜像层
当container读取文件时, 发现该文件既不在container layer 也不在image layer中,那么这种情况,优先使用容器中的文件版本,同时container layer(upperdir)的文件会遮挡住image layer(lowerdir)的同名文件
Modifying file and directories
有以下几种场景,在Overlay环境下,容器是如何修改文件或者目录的
-
Writing to a file for the first time
Container首次写入文件时,但是文件不在容器(upperdir)中,怎么办?overlay/overlay2驱动执行copy_up操作,意思是将该文件从镜像层复制到容器层作为一个副本,然后容器在这个拷贝副本文件基础上进行写操作,但是要注意下面的情况
-
复制副本,只发生在第一次修改文件,以后再次修改文件时直接使用该副本文件,而不是再次复制一个副本
-
OverlyaFS优于AUFS是因为overlay只有二层即上层& 下层,而AUFS是存在多个层,在执行写操作的时,会搜索整个相关的层,从而降低了写性能
-
-
Deleting file and directories
-
当容器运行时删除一文件时,会在容器中(upperdir)中创建一个空白文件,而镜像层中文件版本是不会被删除的(因为镜像层的文件上只读不可写的),在容器中创建一个空白文件目的是覆盖镜像层下层目录对应的文件,此时对于上层容器中此文件就不可见,在退出容器中即会还原
-
当容器删除一件目录时,会在容器中(upperdir)创建一个非透明的目录有效的阻止访问镜像层原有的目录,其原理与空白文件一样
-
Renaming directories
当容器运行时重命名一个目录,只有当源和目标的路径都在top layer才能完成重全名,否则抛出错误提示 it returns EXDEV
error (“cross-device link not permitted”)
OverlayFS and Docker Performance
OverylayFS操作是文件级别(file level)而不是块级别(block level),意思就是当OverlayFS在执行copy_up操作时实际拷贝的是整个文件,即时是一个非常大的文件只需要修改一个字段也要把整个文件复制到容器中(upperdir),这种操作会对容器在写操作时有一定的性能影响,比如写延迟但是在Docker中的storage driver中,overlay2 & overlay的性能要优于AUFS与devicemapper、btrfs,如下优势点
-
Page Caching
OverlayFS支持page cache共享,比如多个container访问同一个文件时会共享这个文件page cache,有利于overlay drivers有效使用内存
-
copy_up
与AUFS类似,在容器首次写入文件时,会执行copy_up操作,从镜像层复制一个文件副本至容器中,特别是在大文件情况下,多少有点写入延迟,但是overlay只会拷贝一次,随后所有的写入操作都只在该文件副本完成,不会反复执行copy_up
OverlayFS的copy_up操作与AUFS不同之处是,AUFS在执行copy_up操作时会检索比OverlayFS更多的layer,从而加重了容器的写入延迟,虽然OverlayFS也支持多层检索,但是它缓和了缓存的命中率
-
inode limits
由于docker性质决定了在主机上会产生大量的image and container,会大量消耗docker文件系统的inode,docker官方推荐使用overlay2 driver以缓解inode过度浪费
performance best practice
-
use fast storage
比如使用高速SSD磁盘
-
Use volumes for write-heavy workloads
针对高写入负载的应用程序,使用外置logic volume替代OveylayFS文件系统,减少应用对磁盘读写延迟,同时外置logic volume还可以提供多个container共享数据与持久化存储
参考文献