docker ------ compose设置容器使用GPU
gpu使用准备
在基于docker-compose使用GPU之前,你的docker必须要能够使用--gpus
参数指定设备基于run
命令启动!
如果你遇到docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
可以自行跳转解决!
docker-compose.yaml文件编写
docker-compose.yaml文件我们注意有version
、services
、networks
三个关键字,version
用于指定代码编写使用的版本规则;services
用于配置服务;networks
用于配置网络。
下面我列出一个测试文件:
version: "3.8"
services:
pdf:
image: "xxxx:xxxxx"
user: "root"
restart: "on-failure"
expose:
- "22"
- "51002-51003"
ports:
- "51001:22"
- "51002-51003:51002-51003"
shm_size: "4g"
networks:
- "ana"
container_name: "literature_pdf"
tty: "true"
fig:
image: "xxxxx:xxxxx"
user: "root"
restart: "on-failure"
expose:
- "22"
- "51009-51020"
ports:
- "51008:22"
- "51009-51020:51009-51020"
shm_size: "8g"
volumes:
- "/data/elfin/utils/detectron2-master:/home/appuser/detectron2-master"
environment:
- "NVIDIA_VISIBLE_DEVICES=all"
deploy:
resources:
reservations:
devices:
- driver: "nvidia"
count: "all"
capabilities: ["gpu"]
networks:
- "ana"
container_name: "fig"
tty: "true"
ocr:
image: "xxxxx:xxxxx"
user: "root"
restart: "on-failure"
expose:
- "22"
- "51005-51007"
ports:
- "51004:22"
- "51005-51007:51005-51007"
shm_size: "6g"
deploy:
resources:
reservations:
devices:
- device_ids: ["1"]
capabilities: ["gpu"]
driver: "nvidia"
networks:
- "ana"
container_name: "ocr"
tty: "true"
entrypoint: ["supervisord", "-n", "-c", "/etc/supervisor/supervisord.conf"]
networks:
ana:
driver: bridge
注:上面的代码只是测试,很多地方需要优化,不是一个非常好的范本!其中,image用于指定镜像。
注意上面实现了容器挂载、gpus使用、自定义网络、端口映射。我感觉GPU的配置是最难的,很多时候老是会犯一些小错误,导致启动后应用无法开启。下面是关于容器的GPU依赖配置:
deploy:
resources:
reservations:
devices:
- driver: "nvidia"
count: "all"
capabilities: ["gpu"]
这里的capabilities是必须要指定的,而且count、driver、capabilities这是一组,不能每个加"-",不然会报错。关于GPU的其他配置可以参考官方文档 https://docs.docker.com/compose/gpu-support/ 。
追加:下面是不错的博客,可以参考:
清澈的爱,只为中国