Pytorch - Dataloader
Basically the DataLoader works with the Dataset object. So to use the DataLoader you need to get your data into this Dataset wrapper. To do this you only need to implement two magic methods: __getitem__
and __len__
. The __getitem__
takes an index and returns a tuple of (x, y) pair. The __len__
is just your usual length that returns the size of the data. And that’s that. [1]
Dataloader如何读取数据
import torch
# Define some sample data
X = torch.randn(5,3) # input
y = torch.randn(5,3) # labe
print(X,y)
我们的数据如下:
tensor([[-0.5138, -1.7766, -0.6183],
[ 0.2235, 0.1974, 0.2892],
[ 1.6249, -0.5768, -1.5081],
[ 0.5972, -0.1788, 0.7579],
[ 1.3844, -0.5480, -1.5612]])
tensor([[-0.5818, 0.1668, 0.5073],
[-1.7707, -0.2907, 1.4918],
[ 1.2157, -2.8250, -0.0247],
[ 0.2748, 0.1086, 1.6052],
[-0.7613, -1.3326, -0.5267]])
然后我们从dataloader读取。
# batch_size = 1, 这意味着只能一次只能读取一个数据
# shuffle = True, 在每个训练周期(epoch)开始时,数据集中的数据将被随机打乱
dataloader = DataLoader(dataset, batch_size=1, shuffle=False)
for i, (batch_x, batch_y) in enumerate(dataloader):
print(f"Batch {i}: input shape {batch_x}, \n label shape {batch_y}")
我们可以得到:
Batch 0: input shape tensor([[-0.5138, -1.7766, -0.6183]]), label shape tensor([[-0.5818, 0.1668, 0.5073]])
Batch 1: input shape tensor([[ 0.5972, -0.1788, 0.7579]]), label shape tensor([[0.2748, 0.1086, 1.6052]])
Batch 2: input shape tensor([[ 1.6249, -0.5768, -1.5081]]), label shape tensor([[ 1.2157, -2.8250, -0.0247]])
Batch 3: input shape tensor([[ 1.3844, -0.5480, -1.5612]]), label shape tensor([[-0.7613, -1.3326, -0.5267]])
Batch 4: input shape tensor([[0.2235, 0.1974, 0.2892]]), label shape tensor([[-1.7707, -0.2907, 1.4918]])
从batch size拿出来的输入的顺序和放进去的顺序是一样的吗?
answer: 所以这个问题被回答了,如果shuffle = true, 那就不是,因为数据会被随机打乱。否则就是相同的顺序。
本文作者:Kane,转载请注明原文链接:https://www.cnblogs.com/hackerk/p/18109127