Pytorch:以单通道(灰度图)加载图片
以单通道(灰度图)加载图片
如果我们想以单通道加载图片,设置加载数据集时的transform
参数如下即可:
from torchvision import datasets, transforms
transform = transforms.Compose(
[
transforms.Grayscale(num_output_channels=1),
transforms.ToTensor()
]
)
data = datasets.CIFAR10(root=".", download=True,transform=transform)
print(type(data[0][0])) # <class 'torch.Tensor'>
print(data[0][0].shape) # torch.Size([1, 32, 32])
print(data[0][0])
# tensor([[[0.2392, 0.1765, 0.1882, ..., 0.5373, 0.5098, 0.5059],
# [0.0745, 0.0000, 0.0392, ..., 0.3725, 0.3529, 0.3686],
# [0.0941, 0.0353, 0.1216, ..., 0.3529, 0.3569, 0.3137],
# ...,
# [0.6784, 0.6039, 0.6157, ..., 0.5255, 0.1412, 0.1490],
# [0.5725, 0.5059, 0.5647, ..., 0.6000, 0.2706, 0.2353],
# [0.5922, 0.5373, 0.5765, ..., 0.7412, 0.4863, 0.3882]]])
可以看到我们得到了归一化后的单通道torch.Tensor
对象。
PS:
torch.Tensor
对象可以以torch.tensor(...)
和torch.Tensor(...)
两种方法初始化得到的,具体区别在于torch.Tensor(...)
可接受多个参数,其参数表示Tensor各个维度的大小,比如torch.Tensor
会返回一个为已初始化的存有10个数(类型为torch.float32
)的Tensor对象,而torch.tensor(10)
只能接受一个参数,该参数表示初始化的数据,比如torch.tensor(10)
会返回一个包含单个值10
(类型为torch.int64
)的Tensor对象:
import torch
a = torch.Tensor(10)
print(a)
# tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 3.6734e-40, 0.0000e+00, 2.0000e+00,
# 0.0000e+00, 2.0000e+00, 7.3787e+2])
print(type(a)) # <class 'torch.Tensor'>
print(a.dtype) # torch.float32
b = torch.tensor(10)
print(b) # tensor(10)
print(type(b)) # <class 'torch.Tensor'>
print(b.dtype) # torch.int64
a = torch.Tensor(2, 3)
print(a)
# tensor([[1.6217e-19, 7.0062e+22, 6.3828e+28],
# [3.8016e-39, 0.0000e+00, 2.0000e+00]])
b = torch.tensor([2, 3])
print(b) # tensor([2, 3])
b = torch.tensor((2, 3))
print(b) # tensor([2, 3])
详情可参见Pytorch讨论区帖子:Difference between torch.tensor() and torch.Tensor()[1]。
这里再多说一点,这里的transforms.ToTensor()
接收PIL格式的数据, 或者是直接从PIL转来的np.ndarray
格式数据, 只要保证进来的数据取值范围是[0, 255]
, 格式是HWC(H、W、C分别对应图片高度、宽度、通道数,这也就是我们在日常生活中存储图片的常用顺序), 像素顺序是RGB, 它就会帮我们完成下列的工作:
- 取值范围[0, 255] / 255.0 => [0, 1.0], 数据格式从
uint8
变成了torch.float32
- 形状(shape)转为CHW,但像素顺序依旧是RGB。
比如如果不加transforms.ToTensor()
,就会直接得到PIL格式的图片:
from torchvision import datasets, transforms
import numpy as np
transform = transforms.Compose(
[
transforms.Grayscale(num_output_channels=1),
]
)
data = datasets.CIFAR10(root=".", download=True,transform=transform)
img = data[0][0]
print(type(img)) # <class 'PIL.Image.Image'>
然后我们可以尝试先将PIL.Image.Image
对象转为np.ndarray
,然后再转为torch.Tensor
类型的对象:
np_img = np.asarray(img)
print(np_img.dtype) # uint8
tensor_from_np = transforms.ToTensor()(np_img)
print(type(tensor_from_np)) # <class 'torch.Tensor'>
print(tensor_from_np.dtype) # torch.float32
print(tensor_from_np.shape) # torch.Size([1, 32, 32])
print(tensor_from_np)
# tensor([[[0.2392, 0.1765, 0.1882, ..., 0.5373, 0.5098, 0.5059],
# [0.0745, 0.0000, 0.0392, ..., 0.3725, 0.3529, 0.3686],
# [0.0941, 0.0353, 0.1216, ..., 0.3529, 0.3569, 0.3137],
# ...,
# [0.6784, 0.6039, 0.6157, ..., 0.5255, 0.1412, 0.1490],
# [0.5725, 0.5059, 0.5647, ..., 0.6000, 0.2706, 0.2353],
# [0.5922, 0.5373, 0.5765, ..., 0.7412, 0.4863, 0.3882]]])
PS: 最后再提一下Tensorflow,Tensorflow虽然调用的
tf.keras.datasets.cifar10.load_data()
能直接得到类型为numpy.ndarray
并按照HWC顺序存储的数据,但是需要手动去添加/255
以对数据归一化,如下所示:
import tensorflow as tf
import numpy as np
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
print(type(x_train)) # <class 'numpy.ndarray'>
print(x_train.shape) # (50000, 32, 32, 3)
print(x_train)
# [[[[ 59 62 63]
# [ 43 46 45]
# [ 50 48 43]
# ...
# [179 177 173]
# [164 164 162]
# [163 163 161]]]]
x_train = x_train.astype(np.float32) / 255.0
print(x_train)
# [[[[0.23137255 0.24313726 0.24705882]
# [0.16862746 0.18039216 0.1764706 ]
# [0.19607843 0.1882353 0.16862746]
# ...
# [0.7019608 0.69411767 0.6784314 ]
# [0.6431373 0.6431373 0.63529414]
# [0.6392157 0.6392157 0.6313726 ]]]]