Pytorch - Dataloader

Basically the DataLoader works with the Dataset object. So to use the DataLoader you need to get your data into this Dataset wrapper. To do this you only need to implement two magic methods: __getitem__ and __len__. The __getitem__ takes an index and returns a tuple of (x, y) pair. The __len__ is just your usual length that returns the size of the data. And that’s that. [1]

Dataloader如何读取数据

import torch

# Define some sample data
X = torch.randn(5,3)  # input
y = torch.randn(5,3)  # labe

print(X,y)

我们的数据如下:

tensor([[-0.5138, -1.7766, -0.6183],
        [ 0.2235,  0.1974,  0.2892],
        [ 1.6249, -0.5768, -1.5081],
        [ 0.5972, -0.1788,  0.7579],
        [ 1.3844, -0.5480, -1.5612]]) 
tensor([[-0.5818,  0.1668,  0.5073],
        [-1.7707, -0.2907,  1.4918],
        [ 1.2157, -2.8250, -0.0247],
        [ 0.2748,  0.1086,  1.6052],
        [-0.7613, -1.3326, -0.5267]])

然后我们从dataloader读取。

# batch_size = 1, 这意味着只能一次只能读取一个数据
# shuffle = True, 在每个训练周期(epoch)开始时,数据集中的数据将被随机打乱
dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

for i, (batch_x, batch_y) in enumerate(dataloader):
    print(f"Batch {i}: input shape {batch_x}, \n label shape {batch_y}")

我们可以得到:

Batch 0: input shape tensor([[-0.5138, -1.7766, -0.6183]]), label shape tensor([[-0.5818,  0.1668,  0.5073]])
Batch 1: input shape tensor([[ 0.5972, -0.1788,  0.7579]]), label shape tensor([[0.2748, 0.1086, 1.6052]])
Batch 2: input shape tensor([[ 1.6249, -0.5768, -1.5081]]), label shape tensor([[ 1.2157, -2.8250, -0.0247]])
Batch 3: input shape tensor([[ 1.3844, -0.5480, -1.5612]]), label shape tensor([[-0.7613, -1.3326, -0.5267]])
Batch 4: input shape tensor([[0.2235, 0.1974, 0.2892]]), label shape tensor([[-1.7707, -0.2907,  1.4918]])

从batch size拿出来的输入的顺序和放进去的顺序是一样的吗?
answer: 所以这个问题被回答了,如果shuffle = true, 那就不是,因为数据会被随机打乱。否则就是相同的顺序。

posted @ 2024-04-01 20:07  kingchou007  阅读(5)  评论(0编辑  收藏  举报