pytorch利用多个GPU并行计算多gpu

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/Answer3664/article/details/98992409
参考:

https://pytorch.org/docs/stable/nn.html

https://github.com/apachecn/pytorch-doc-zh/blob/master/docs/1.0/blitz_data_parallel_tutorial.md

一、 torch.nn.DataParallel
torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)

在正向传递中,模块在每个设备上复制,每个副本处理一部分输入。在向后传递期间,来自每个副本的渐变被加到原始模块中。

module:需要并行处理的模型
device_ids:并行处理的设备,默认使用所有的cuda
output_device:输出的位置,默认输出到cuda:0
例子:

>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var) # input_var can be on any device, including CPU
torch.nn.DataParallel()返回一个新的模型,能够将输入数据自动分配到所使用的GPU上。所以输入数据的数量应该大于所使用的设备的数量。

二、一个完整例子
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')


# 随机数据集
class RandomDataset(Dataset):

def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)

def __getitem__(self, index):
return self.data[index]

def __len__(self):
return self.len


rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
batch_size=batch_size, shuffle=True)


# 以简单模型为例,同样可以用于CNN, RNN 等复杂模型
class Model(nn.Module):
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)

def forward(self, input):
output = self.fc(input)
print('In model: input size', input.size(), 'output size:', output.size())
return output


# 实例
model = Model(input_size, output_size)

if torch.cuda.device_count() > 1:
print("Use", torch.cuda.device_count(), 'gpus')
model = nn.DataParallel(model)

model.to(device)

for data in rand_loader:
input = data.to(device)
output = model(input)
print('Outside: input size ', input.size(), 'output size: ', output.size())
输出:

In model: input size torch.Size([30, 5]) output size: torch.Size([30, 2])
Outside: input size  torch.Size([30, 5]) output size:  torch.Size([30, 2])
In model: input size torch.Size([30, 5]) output size: torch.Size([30, 2])
Outside: input size  torch.Size([30, 5]) output size:  torch.Size([30, 2])
In model: input size torch.Size([30, 5]) output size: torch.Size([30, 2])
Outside: input size  torch.Size([30, 5]) output size:  torch.Size([30, 2])
In model: input size torch.Size([10, 5]) output size: torch.Size([10, 2])
Outside: input size  torch.Size([10, 5]) output size:  torch.Size([10, 2])

若有2个GPU

Use 2 GPUs!
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
若有3个GPU

Use 3 GPUs!
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
总结:

DataParallel自动的划分数据,并将作业发送到多个GPU上的多个模型。DataParallel会在每个模型完成作业后,收集与合并结果然后返回给你。
————————————————
版权声明:本文为CSDN博主「Answerlzd」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/Answer3664/article/details/98992409

posted @ 2019-09-05 16:05  交流_QQ_2240410488  阅读(4137)  评论(0编辑  收藏  举报