docker容器挂了,显卡驱动异常问题:nvidia-container-cli: initialization error: nvml error: driver not loaded...
docker容器起不来,Nvidia驱动相关问题
1.具体报错
Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown Error: failed to start containers: xxxxxxxxx xxx@xxx:~$ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
2.问题分析
容器依赖于Nvidia显卡驱动,驱动掉了。
题外话:该客户的服务器长年有外网,显卡驱动在这几个月不止一次掉,排查原因Linux/Ubuntu内核自动更新,驱动程序失效,重装驱动往往是一个解决办法,但是内核一旦再次更新,可能还是会导致显卡驱动失效。
3.解决办法
关闭内核自动更新
将下面两个配置文件里的值全改为“0”,保存后重启
xxxx@xxxx:/xxxxxx/xxxxxxxxxxx/xxxxx$ uname -r 5.15.0-58-generic xxxx@xxxx:/xxxxxx/xxxxxxxxxxx/xxxxx$ cat /etc/apt/apt.conf.d/10periodic #把下面值全改为“0” APT::Periodic::Update-Package-Lists "0"; APT::Periodic::Download-Upgradeable-Packages "0"; APT::Periodic::AutocleanInterval "0"; xxxx@xxxx:/xxxxxx/xxxxxxxxxxx/xxxxx$ cat /etc/apt/apt.conf.d/20auto-upgrades #把下面值全改为“0” APT::Periodic::Update-Package-Lists "0"; APT::Periodic::Unattended-Upgrade "0"; xxxx@xxxx:/xxxxxx/xxxxxxxxxxx/xxxxx$ sudo vim /etc/apt/apt.conf.d/10periodic xxxx@xxxx:/xxxxxx/xxxxxxxxxxx/xxxxx$ sudo vim /etc/apt/apt.conf.d/20auto-upgrades xxxx@xxxx:/xxxxxx/xxxxxxxxxxx/xxxxx$ sudo reboot -i
然后重装驱动,装好之后发现容器能起来,容器外可以nvidia-smi, 但是容器内不行,程序也跑不了
RuntimeError: No CUDA GPUs are available (xxxxai) root@xxxxxx:/workspace/projects/xxxxx/xxxxai/xxxxx# nvidia-smi No devices were found
重启一下docker服务
systemctl restart docker
OK,运行正常!
有一位大佬说可以“可以安装带有 dkms 选项的驱动程序”,我没测试,大家也可以参考一下:https://blog.csdn.net/wtlll/article/details/126541686
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· .NET Core 中如何实现缓存的预热?
· 三行代码完成国际化适配,妙~啊~
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?