pytorch中查看gpu信息、选择使用gpu（转）

转自：https://blog.csdn.net/pearl8899/article/details/109503803

pytorch中查看gpu信息、选择使用gpu

前提：安装好Python3.6+，torch(GPU)，登录一台开发机。
一、GPU基本信息

1.查看cuda是否可用：torch.cuda.is_available()

>>> import torch
>>> torch.cuda.is_available()
True

2.查看gpu数量：torch.cuda.device_count()

>>> torch.cuda.device_count()
3

3.查看gpu名字，设备索引默认从0开始：torch.cuda.get_device_name(0)

>>> torch.cuda.get_device_name(0)
'Tesla P40'

4.当前设备索引：torch.cuda.current_device()

>>> torch.cuda.current_device()
0

5.查看gpu的内存使用情况：nvidia-smi

每隔1s刷新一次gpu使用情况：watch -n 1 nvidia-smi

此时退出Python，直接在开发机上输入上述命令即可：

(bert) [op@algo src]$ nvidia-smi
Thu Nov  5 21:52:32 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129      Driver Version: 410.129      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P40           Off  | 00000000:03:00.0 Off |                    0 |
| N/A   25C    P8    11W / 250W |      0MiB / 22919MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P40           Off  | 00000000:04:00.0 Off |                    0 |
| N/A   26C    P8    10W / 250W |      0MiB / 22919MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P40           Off  | 00000000:84:00.0 Off |                    0 |
| N/A   24C    P8     9W / 250W |      0MiB / 22919MiB |      0%      Default |
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(bert) [op@algo src]$
(bert) [op@algo src]$
(bert) [op@algo src]$
(bert) [op@algo src]$ watch -n 1 nvidia-smi
### 出现上述界面，只是每1s刷新一次GPU使用情况。

二、代码中，如何设定使用哪张GPU

1.单卡的时候，没有选择余地，就一张。

2.多卡的时候，分两种情况，一个是数据并行，多张卡一起工作；另一个是只在一张卡上运行，比如由4张卡[0, 1, 2, 3]，我想在卡1上运行任务。

情况一：数据并行

#配置device_ids，选择你想用的卡编号。     
device_ids = [0, 1, 2]
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    model = torch.nn.DataParallel(model, device_ids)

此时的gpyu使用情况：

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     19272      C   python                                      9009MiB |
|    1     19272      C   python                                      5753MiB |
|    2     19272      C   python                                      5753MiB |
|    3     19272      C   python                                      5755MiB |
+-----------------------------------------------------------------------------+

模型的结构默认就是在device_ids[0]，即第一块卡上，也就解释了为什么第一块卡的显存会占用的比其他卡要更多一些。进一步说也就是当你调用nn.DataParallel的时候，只是在你的input数据是并行的，但是你的output loss却不是这样的，每次都会在第一块GPU相加计算，这就造成了第一块GPU的负载远远大于剩余其他的显卡。

情况二：一张卡上运行

#在代码开头写上你想使用的cuda编号，这次是字符串形式。

三、GPU科普

参考：

posted @ 2023-01-09 11:46 faf4r 阅读(5298) 评论(0) 收藏举报

刷新页面返回顶部

faf4r

pytorch中查看gpu信息、选择使用gpu（转）

转自：https://blog.csdn.net/pearl8899/article/details/109503803

pytorch中查看gpu信息、选择使用gpu

公告