【docker基础知识】docker坑问题汇总

1. Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time
   解决问题: https://github.com/opencontainers/runc/issues/1740

   most of the memories are consumed by page cache, echo 1 > /proc/sys/vm/drop_caches

 

2. Rpmdb checksum is invalid: dCDPT(pkg checksums):
     描述:

           rpm数据库损坏需要重建。需要在 yum install … 前使用 rpm –rebuilddb重建数据库

      解决方法:

           RUN rpm --rebuilddb && yum install -y ...

 

3. Docker宿主机agetty进程cpu占用率100%
    描述:

         使用"docker run"运行容器时使用了 /sbin/init和--privileged参数

    解决方法:

          在宿主机以及Container中运行下述命令              

systemctl stop getty@tty1.service
systemctl mask getty@tty1.service
4. Failed to get D-Bus connection: Operation not permitted
     描述:

           centos 7.2容器内使用systemctl命令

     解决方法:

           docker run --privileged -d  centos:7.2.1511 /usr/sbin/init

                     

5. 解决ssh登录慢慢的问题
     使用了dns反查,当ssh某个IP时,通过DNS反查相对应的域名,如果DNS中没有这个IP的域名解析,等待超时

     解决方法:/etc/ssh/sshd_config

         设置 UseDNS no                     

 

6.  /etc/hosts, /etc/resolv.conf和/etc/hostname都是易失
     问题描述:

          /etc/hosts, /etc/resolv.conf和/etc/hostname,容器中的这三个文件不存在于镜像,而是存在于/var/lib/docker/containers/<container_id>,在启动容器的时候,通过mount的形式将这些文件挂载到容器内部。因此,如果在容器中修改这些文件的话,修改部分不会存在于容器的top layer,而是直接写入这三个物理文件中。

     解决方法:

          通过docker run命令的--add-host参数来为容器添加host与ip的映射关系

          通过echo -e "aaa.com 10.10.10.10\n" >> /etc/hosts

 

7. docker容器centos 7.2镜像支持中文
sudo localedef -c -f UTF-8 -i zh_CN zh_CN.utf8

export LC_ALL="zh_CN.UTF-8"

 

8. docker容器时间为UTC时间,与宿主机相差8小时
     cp -a /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

 

9. overlayfs: Can't delete file moved from base layer to newly created dir even on ext4
    Centos 提供的新文件系统 XFS 和 Overlay 兼容问题导致, 这个问题的修复在内核 4.4.6以上( https://github.com/moby/moby/issues/9572)

     Fixed in linux 4.5 going to be backported into next 4.4.y and other stable brances. Simple test sequence in commit message (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?)id=45d11738969633ec07ca35d75d486bf2d8918df6 

     解决方法:

1. 停止各个中间服务,
stop容器(docker stop $(docker ps -qa))
systemctl stop docker
备份数据/srv lsof | grep srv

 

2. 查看磁盘分区fdisk -l , mount | grep srv
umount /dev/mapper/centos-srv
格式化: mkfs.xfs -fn ftype=1 /dev/mapper/centos-srv 
查看ftype是否设置为1: xfs-info /srv |grep ftype

mount /dev/mapper/centos-srv /srv/

3. 恢复数据/srv
systemctl start docker
docker start $(docker ps -qa)

     

10. /var/lib/docker/overlay2 占用很大,占用几百G空间
    描述:这个问题应该是容器内应用产生的数据或者日志造成

    解决方法:

         进入/var/lib/docker/overlay2,du -h --max-depth=1查看哪个容器占用的比较大,我擦一看占用450G,

         一查看发现日志占用的多,这个啥吗应用这么刷日志,是调试遗留的一个容器,一直在刷错误日志,docker kill and docker rm,一下释放了450多G

 

11. Error starting daemon: error initializing graphdriver: driver not supported
    描述: 使用overlay2存储驱动启动docker daemon报错

    解决方法: 添加配置如下: 

cat /etc/docker/daemon.json

{
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}

    或者添加启动参数:

/usr/bin/dockerd --storage-driver=overlay2 --storage-opt overlay2.override_kernel_check=1 

 

11. 修改docker容器最大文件数(open files)
     直接修改docker container的 /etc/security/limits.conf无效 

宿主机上执行如下操作:

[lin@node1 ~]$ cat /etc/sysconfig/docker

     ulimit -HSn 999999

重起docker daemon进程,systemctl restart docker

 

12. /var/lib/docker/containers 占用过大
    描述: 容器日志一般存放在/var/lib/docker/containers/container_id/下面, 以json.log结尾的文件(业务日志)很大。

 如果docker容器正在运行,那么使用rm -rf方式删除日志后,通过df -h会发现磁盘空间并没有释放。原因是在Linux或者Unix系统中,通过rm -rf或者文件管理器删除文件,将会从文件系统的目录结构上解除链接(unlink)。如果文件是被打开的(有一个进程正在使用),那么进程将仍然可以读取该文件,磁盘空间也一直被占用

 

     解决方法1:cat /dev/null > *-json.log

#!/bin/sh


logs=$(find /var/lib/docker/containers/ -name *-json.log)

for log in $logs
do
echo "clean logs : $log"
cat /dev/null > $log
done

     解决方法2:增加dockerd启动参数,/etc/docker/daemon.json

{
"registry-mirrors": ["https://registry.docker-cn.com"],
"max-concurrent-downloads": 6,
"insecure-registries":["harbor.master.online.local", "harbor.local.com"],
"log-driver":"json-file",
"log-opts": {"max-size":"2G", "max-file":"10"}
}
 

13. umount.nfs device is busy
     描述:

[root@localhost /]# umount /data/share/gaocheng

umount.nfs: /data/share/gaocheng: device is busy

    解决方法:

 

-v 表示 verbose 模式。进程以 ps 的方式显示,包括 PID、USER、COMMAND、ACCESS 字段
-m 表示指定文件所在的文件系统或者块设备(处于 mount 状态)。所有访问该文件系统的进程都被列出。
   如上所示,有进程占用了,将其kill掉,再重新取消挂载

   kill -9 25869 45636 131466 
   umount /data/share/gaocheng
 

14. docker hang
    调试: curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2

    only found in docker/container/state.go发现想要获取lock,真特么得不到,导致所有都等待,暂未发现哪里给lock了

func (s *State) IsRunning() bool {
s.Lock()
res := s.Running
s.Unlock()
return res
}
goroutine 68266990 [semacquire, 1350 minutes]:
sync.runtime_SemacquireMutex(0xc421004e04, 0x11f0900)
    /usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc421004e00)
    /usr/local/go/src/sync/mutex.go:134 +0xee
github.com/docker/docker/container.(*State).IsRunning(0xc421004e00, 0xc42995c2b7)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.go:250 +0x2d
github.com/docker/docker/daemon.(*Daemon).ContainerStop(0xc420260000, 0xc42995c2b7, 0x40, 0xc421c3cac0, 0xc428235298, 0xc42e5fac01)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stop.go:23 +0x84
github.com/docker/docker/api/server/router/container.(*containerRouter).postContainersStop(0xc420b4bfc0, 0x7f85ac36f560, 0xc42844d3e0, 0x38b7d00, 0xc42b8aaa80, 0xc42d85be00, 0xc42844d290, 0x2a4e3dd, 0x5)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.go:186 +0xf0
github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainersStop)-fm(0x7f85ac36f560, 0xc42844d3e0, 0x38b7d00, 0xc42b8aaa80, 0xc42d85be00, 0xc42844d290, 0x7f85ac36f560, 0xc42844d3e0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:67 +0x69
github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x7f85ac36f560, 0xc42844d3e0, 0x38b7d00, 0xc42b8aaa80, 0xc42d85be00, 0xc42844d290, 0x7f85ac36f560, 0xc42844d3e0)

 

15 Too many open files的四种解决办法
一  单个进程打开文件句柄数过多

     通过“cat /proc/<pid>/limits”查看

二 操作系统打开的文件句柄数过多

     /proc/sys/fs/file-max”命令来动态修改该值,也可以通过修改"/etc/sysctl.conf"文件来永久修改该值

三 systemd对该进程进行了限制

      LimitNOFILE=

四 inotify达到上限

      监控机制,可以监控文件系统的变化。该机制受到2个内核参数的影响:“fs.inotify.max_user_instances”和“fs.inotify.max_user_watches”,其中“fs.inotify.max_user_instances”表示每个用户最多可以创建的inotify instances数量上限,“fs.inotify.max_user_watches”表示么个用户同时可以添加的watch数目

     参考:Too many open files的四种解决办法

            https://bbs.huaweicloud.com/blogs/7c4c0324a5da11e89fc57ca23e93a89f

     参考:Linux下设置最大文件打开数nofile及nr_open、file-max

            https://www.cnblogs.com/zengkefu/p/5635153.html

 

16 docker harbor上镜像硬删除
 docker-compose stop

docker run -it --name gc --rm --volumes-from registry vmware/registry:2.6.2-photon garbage-collect --dry-run /etc/registry/config.yml
docker run -it --name gc --rm --volumes-from registry vmware/registry:2.6.2-photon garbage-collect /etc/registry/config.yml

docker-compose start

 

17 failed to disable IPv6 forwarding for container's interface all' in Docker daemon logs 
Issue
If IPv6 networking has been disabled in the kernel, log entries of the following format may be seen in the Docker daemon logs:

Aug 21 11:05:39 ucpworker-0 dockerd[16657]: time="2018-08-21T11:05:39Z" level=error msg="failed to disable IPv6 forwarding for container's interface all: open /proc/sys/net/ipv6/conf/all/disable_ipv6: no such file or directory"
Aug 21 11:05:39 ucpworker-0 dockerd[16657]: time="2018-08-21T11:05:39.599992614Z" level=warning msg="Failed to disable IPv6 on all interfaces on network namespace \"/var/run/docker/netns/de1d0ed4fae9\": reexec to set IPv6 failed: exit status 4"
Root Cause
These log entries are displayed when the Docker daemon fails to disable IPv6 forwarding on a container's interfaces, because IPv6 networking has been disabled at the kernel level (i.e. the ipv6.disable=1 parameter has been added to your kernel boot configuration).

In this situation, the log entries are expected and can be ignored.

Resolution
In a future release of the Docker engine the log level will be updated, so that these entries are only warning messages, as this does not indicate an actionable error case.

  参考:https://success.docker.com/article/failed-to-disable-ipv6-forwarding-for-containers-interface-all-error-in-docker-daemon-logs

 

18. iptables -I FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    在宿主机上发往docker0网桥的网络数据报文,如果是该数据报文所处的连接已经建立的话,则无条件接受,并由Linux内核将其发送到原来的连接上,即回到Docker Container内部

 

 

19. malformed module missing dot in first path element
go1.13 mod 要求import 后面的path 第一个元素,符合域名规范,比如code.be.mingbai.com/tools/soa

    go mod 要求所有依赖的 import path 的path 以域名开头,如果现有项目转1.13的go mod 模式,且不是以域名开头则需要修改。

go mod init csi.storage.com/storage-csi

posted on 2020-03-28 20:52  adolfmc  阅读(4158)  评论(0编辑  收藏  举报