一个进入容器后curl的不对的问题诊断

一个容器,进入容器的时候是否开启gpu,会导致 curl 的行为不一致。具体表现为

容器开启 --gpus all 后进入容器,执行 curl 会出现“curl: symbol lookup error: curl: undefined symbol: curl_mime_free” 错误

诊断中,我先比对了两个 --version 是否一致。
开启前和开启后的版本信息里有一行不一致:
开启前:

Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp

开启后:

Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp

我一开始以为是否是环境变量不同导致的,查看了环境变量,开启前和开启后差了一个

NVIDIA_VISIBLE_DEVICES=all

把这个环境变量通过unset NVIDIA_VISIBLE_DEVICES 也不解决问题,看着不是环境变量问题。

接着,通过 which curl 比对,发现开启前和开启后的curl的位置不一样:

开启前:/usr/bin/curl
开启后:/usr/local/bin/curl

两个curl的链接库也差异很大(使用 ldd /usr/bin/curlldd /usr/local/bin/curl 查看)。

前者:

linux-vdso.so.1 =>  (0x00007ffe2c56e000)
libcurl.so.4 => /usr/local/lib/libcurl.so.4 (0x00007fc68b1b7000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc68af9a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc68abd0000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fc68a968000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fc68a523000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc68a309000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc68b42e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc68a105000)

后者:

linux-vdso.so.1 =>  (0x00007ffceeb66000)
libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007ffad7e52000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffad7c35000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffad786b000)
libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007ffad7638000)
librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007ffad741c000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007ffad71b4000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007ffad6d6f000)
libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007ffad6b25000)
liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007ffad6916000)
libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007ffad66c5000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffad64ab000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffad80c4000)
libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007ffad617a000)
libhogweed.so.4 => /usr/lib/x86_64-linux-gnu/libhogweed.so.4 (0x00007ffad5f47000)
libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007ffad5d11000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007ffad5a91000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffad588d000)
libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007ffad55bb000)
libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007ffad538c000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007ffad5188000)
libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007ffad4f7d000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007ffad4d62000)
libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007ffad4b47000)
libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007ffad4906000)
libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007ffad46a2000)
libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007ffad448f000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007ffad428b000)
libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007ffad4082000)
libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007ffad3df8000)
libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007ffad3b56000)
libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007ffad3923000)
libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007ffad370d000)
libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007ffad3505000)
libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007ffad32dc000)
libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007ffad30cd000)
libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007ffad2e82000)
libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007ffad2bad000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007ffad2975000)

我看了下容器的 PATH,里面只有指定了 /usr/local/bin,没有设置 /usr/bin 比 /usr/local/bin 优先级更高。

临时解决方法也很简单,在Python脚本里临时 patch下:

import os

def curl_patch():

	curl_patch_cmds = [
	    'mv -f /usr/local/bin/curl /usr/local/bin/curl.bak',
	    'ln -s /usr/bin/curl /usr/local/bin/curl'
	]

	for curl_patch_cmd in curl_patch_cmds:
	    ret = os.system(curl_patch_cmd)
	    logger.info('curl_patch_cmd:', curl_patch_cmd, 'ret:', ret)

这个问题的本质是对基础设施的路径管理问题。混合容器后就变成一个疑难问题了。

--end--

posted @ 2024-02-06 20:44  ffl  阅读(29)  评论(0编辑  收藏  举报