docker ------ compose设置容器使用GPU

gpu使用准备

在基于docker-compose使用GPU之前,你的docker必须要能够使用--gpus参数指定设备基于run命令启动!
如果你遇到docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].可以自行跳转解决

docker-compose.yaml文件编写

docker-compose.yaml文件我们注意有versionservicesnetworks三个关键字,version用于指定代码编写使用的版本规则;services用于配置服务;networks用于配置网络。
下面我列出一个测试文件:

version: "3.8"
services:
    pdf:
        image: "xxxx:xxxxx"
        user: "root"
        restart: "on-failure"
        expose:
          - "22"
          - "51002-51003"
        ports:
          - "51001:22"
          - "51002-51003:51002-51003"
        shm_size: "4g"
        networks:
          - "ana"
        container_name: "literature_pdf"
        tty: "true"
    fig:
        image: "xxxxx:xxxxx"
        user: "root"
        restart: "on-failure"
        expose:
          - "22"
          - "51009-51020"
        ports:
          - "51008:22"
          - "51009-51020:51009-51020"
        shm_size: "8g"
        volumes:
          - "/data/elfin/utils/detectron2-master:/home/appuser/detectron2-master"
        environment:
          - "NVIDIA_VISIBLE_DEVICES=all"
        deploy:
            resources:
                reservations:
                    devices:
                      - driver: "nvidia"
                        count: "all"
                        capabilities: ["gpu"]
        networks:
          - "ana"
        container_name: "fig"
        tty: "true"
    ocr:
        image: "xxxxx:xxxxx"
        user: "root"
        restart: "on-failure"
        expose:
          - "22"
          - "51005-51007"
        ports:
          - "51004:22"
          - "51005-51007:51005-51007"
        shm_size: "6g"
        deploy:
            resources:
                reservations:
                    devices:
                      - device_ids: ["1"]
                        capabilities: ["gpu"]
                        driver: "nvidia"
        networks:
          - "ana"
        container_name: "ocr"
        tty: "true"
        entrypoint: ["supervisord", "-n", "-c", "/etc/supervisor/supervisord.conf"]
networks:
    ana:
        driver: bridge

注:上面的代码只是测试,很多地方需要优化,不是一个非常好的范本!其中,image用于指定镜像。

注意上面实现了容器挂载、gpus使用、自定义网络、端口映射。我感觉GPU的配置是最难的,很多时候老是会犯一些小错误,导致启动后应用无法开启。下面是关于容器的GPU依赖配置:

deploy:
    resources:
        reservations:
            devices:
                - driver: "nvidia"
                  count: "all"
                  capabilities: ["gpu"]

这里的capabilities是必须要指定的,而且count、driver、capabilities这是一组,不能每个加"-",不然会报错。关于GPU的其他配置可以参考官方文档 https://docs.docker.com/compose/gpu-support/

追加:下面是不错的博客,可以参考:

posted @ 2021-11-03 14:53  巴蜀秀才  阅读(9386)  评论(0编辑  收藏  举报