Pytorch - Dataloader

Basically the DataLoader works with the Dataset object. So to use the DataLoader you need to get your data into this Dataset wrapper. To do this you only need to implement two magic methods: __getitem__ and __len__. The __getitem__ takes an index and returns a tuple of (x, y) pair. The __len__ is just your usual length that returns the size of the data. And that’s that. [1]

Dataloader如何读取数据

import torch
# Define some sample data
X = torch.randn(5,3) # input
y = torch.randn(5,3) # labe
print(X,y)

我们的数据如下:

tensor([[-0.5138, -1.7766, -0.6183],
[ 0.2235, 0.1974, 0.2892],
[ 1.6249, -0.5768, -1.5081],
[ 0.5972, -0.1788, 0.7579],
[ 1.3844, -0.5480, -1.5612]])
tensor([[-0.5818, 0.1668, 0.5073],
[-1.7707, -0.2907, 1.4918],
[ 1.2157, -2.8250, -0.0247],
[ 0.2748, 0.1086, 1.6052],
[-0.7613, -1.3326, -0.5267]])

然后我们从dataloader读取。

# batch_size = 1, 这意味着只能一次只能读取一个数据
# shuffle = True, 在每个训练周期(epoch)开始时,数据集中的数据将被随机打乱
dataloader = DataLoader(dataset, batch_size=1, shuffle=False)
for i, (batch_x, batch_y) in enumerate(dataloader):
print(f"Batch {i}: input shape {batch_x}, \n label shape {batch_y}")

我们可以得到:

Batch 0: input shape tensor([[-0.5138, -1.7766, -0.6183]]), label shape tensor([[-0.5818, 0.1668, 0.5073]])
Batch 1: input shape tensor([[ 0.5972, -0.1788, 0.7579]]), label shape tensor([[0.2748, 0.1086, 1.6052]])
Batch 2: input shape tensor([[ 1.6249, -0.5768, -1.5081]]), label shape tensor([[ 1.2157, -2.8250, -0.0247]])
Batch 3: input shape tensor([[ 1.3844, -0.5480, -1.5612]]), label shape tensor([[-0.7613, -1.3326, -0.5267]])
Batch 4: input shape tensor([[0.2235, 0.1974, 0.2892]]), label shape tensor([[-1.7707, -0.2907, 1.4918]])

从batch size拿出来的输入的顺序和放进去的顺序是一样的吗?
answer: 所以这个问题被回答了,如果shuffle = true, 那就不是,因为数据会被随机打乱。否则就是相同的顺序。

posted @   kingchou007  阅读(7)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!
点击右上角即可分享
微信分享提示