Youth

pytorc使用多个GPU同时训练模型

pytorch使用同一设备上多个GPU同时训练模型,只需在原有代码中稍作修改即可。

改动1:

1
2
<code-pre class="code-pre code-pre-line" id="pre-GeKjJe"><code-line class="line-numbers-rows"></code-line>os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,3' # 这里输入你的GPU_id
</code-pre>

  

改动2:

1
2
3
4
5
<code-pre class="code-pre code-pre-line" id="pre-6EdzGk"><code-line class="line-numbers-rows"></code-line>if torch.cuda.device_count() > 1:
<code-line class="line-numbers-rows"></code-line>  model = nn.DataParallel(model)
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>model.to(device)
</code-pre>

  

 

使用多GPU训练,速度明显得到提升。

 

 

 

 

 

 

 

官方示例代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
<code-pre class="code-pre code-pre-line" id="pre-frCRCM"><code-line class="line-numbers-rows"></code-line>import torch
<code-line class="line-numbers-rows"></code-line>import torch.nn as nn
<code-line class="line-numbers-rows"></code-line>from torch.utils.data import Dataset, DataLoader
<code-line class="line-numbers-rows"></code-line>import os
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>os.environ['CUDA_VISIBLE_DEVICES'] = '2,3' # 这里输入你的GPU_id
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line># Parameters and DataLoaders
<code-line class="line-numbers-rows"></code-line>input_size = 5
<code-line class="line-numbers-rows"></code-line>output_size = 2
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>batch_size = 30
<code-line class="line-numbers-rows"></code-line>data_size = 100
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line># Dummy DataSet
<code-line class="line-numbers-rows"></code-line>class RandomDataset(Dataset):
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>    def __init__(self, size, length):
<code-line class="line-numbers-rows"></code-line>        self.len = length
<code-line class="line-numbers-rows"></code-line>        self.data = torch.randn(length, size)
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>    def __getitem__(self, index):
<code-line class="line-numbers-rows"></code-line>        return self.data[index]
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>    def __len__(self):
<code-line class="line-numbers-rows"></code-line>        return self.len
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
<code-line class="line-numbers-rows"></code-line>                         batch_size=batch_size, shuffle=True)
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line># Simple Model
<code-line class="line-numbers-rows"></code-line>class Model(nn.Module):
<code-line class="line-numbers-rows"></code-line>    # Our model
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>    def __init__(self, input_size, output_size):
<code-line class="line-numbers-rows"></code-line>        super(Model, self).__init__()
<code-line class="line-numbers-rows"></code-line>        self.fc = nn.Linear(input_size, output_size)
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>    def forward(self, input):
<code-line class="line-numbers-rows"></code-line>        output = self.fc(input)
<code-line class="line-numbers-rows"></code-line>        print("\tIn Model: input size", input.size(),
<code-line class="line-numbers-rows"></code-line>              "output size", output.size())
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>        return output
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line># Create Model and DataParallel
<code-line class="line-numbers-rows"></code-line>model = Model(input_size, output_size)
<code-line class="line-numbers-rows"></code-line>if torch.cuda.device_count() > 1:
<code-line class="line-numbers-rows"></code-line>  print("Let's use", torch.cuda.device_count(), "GPUs!")
<code-line class="line-numbers-rows"></code-line>  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
<code-line class="line-numbers-rows"></code-line>  model = nn.DataParallel(model)
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>model.to(device)
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>
<code-line class="line-numbers-rows"></code-line>#Run the Model
<code-line class="line-numbers-rows"></code-line>for data in rand_loader:
<code-line class="line-numbers-rows"></code-line>    input = data.to(device)
<code-line class="line-numbers-rows"></code-line>    output = model(input)
<code-line class="line-numbers-rows"></code-line>    print("Outside: input size", input.size(),
<code-line class="line-numbers-rows"></code-line>          "output_size", output.size())
</code-pre>

  


__EOF__

本文作者lishuaics
本文链接https://www.cnblogs.com/L-shuai/p/15811569.html
关于博主:IT小白
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   lishuaics  阅读(577)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
点击右上角即可分享
微信分享提示