pytorch中的上采样（上采样，转置卷积，上池化，PixelShuffle）

Upsample（上采样，插值）

Upsample

torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)

Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.

对给定的多通道1D(时间)、2D(空间)或3D(容量)数据进行上采样。

The input data is assumed to be of the form minibatch x channels x [optional depth] x [optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor.

输入格式:

一维数据 [N, C, W]

二维数据 [N, C, H, W]

三维数据 [N, C, D, H, W]

The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.

Upsample可用的算法是最近邻和线性，双线性，双三次和三线性插值算法。

One can either give a scale_factor or the target output size to calculate the output size. (You cannot give both, as it is ambiguous)

可以给出scale_factor或目标输出大小来计算输出大小（不能同时给出两者）。

参数说明：

size： (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], optional) – output spatial sizes

scale_factor ：(float or Tuple[float] or Tuple[float, float] or Tuple[float, float, float],optional) – multiplier for spatial size. Has to match input size if it is a tuple.

输出尺寸可以指定size，也可以通过缩放的倍数指定。

mode： (str, optional) – the upsampling algorithm: one of 'nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Default: 'nearest'

align_corners (bool, optional) – if True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is 'linear', 'bilinear', or 'trilinear'. Default: False

输入输出形状：

注意：

在align_corners = True的情况下，线性插值模式（线性，双线性，双三次和三线性）不会按比例对齐输出像素和输入像素，因此输出值可能取决于输入大小。这是这些模式（0.3.1版之前）的默认行为。从那时起，默认行为是align_corners = False。有关如何影响输出的具体示例，请参见下文。

关于align_corners的解释参考知乎这篇文章。

输入为：

对它上采样两倍后，得到下图：

首先介绍 align_corners=False，它是 pytorch 中 interpolate 的默认选项。这种设定下，我们认定像素值位于像素块的中心。首先观察绿色框内的像素，我们会发现它们严格遵守了 bilinear 的定义。而对于角上的四个点，其像素值保持了原图的值。边上的点则根据角点的值，进行了 bilinear 插值。所以，我们从全局来看，内部和边缘处采用了比较不同的规则。

在 align_corners=True世界观里，像素值位于网格上，如上图所示：

这里仔细的读者会发现，3*3 的图像上采两倍后，变成了 5*5。更广泛地来讲，对于输入尺寸是 (2x+1) * (2x+1) 的图片，其经过 align_corners=True 的上采样后，尺寸变为 (4x+1) * (4x+1)。所以虽然内容上整齐了，外在的数目上，却没了那种 2 的整数次幂的美感。（注：这里指定了输出为5x5）

具体区别呢?
查找pytorch论坛得图:

当**align_corners = True**时，像素被视为网格的格子上的点,拐角处的像素对齐.可知是点之间是等间距的
当**align_corners = False**时, 像素被视为网格的交叉线上的点, 拐角处的点依然是原图像的拐角像素,但是插值的点间却按照上图的取法取,导致点与点之间是不等距的

如果要下采样/常规调整大小，则应使用interpolate（）。

UpsamplingNearest2d

torch.nn.UpsamplingNearest2d(size=None, scale_factor=None)

UpsamplingBilinear2d

torch.nn.UpsamplingBilinear2d(size=None, scale_factor=None)

ConvTranspose（转置卷积）

`torch.nn.ConvTranspose1d`

torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')

对由多个输入平面组成的输入图像应用1D转置卷积算子。

这个模块可以看作是相对于其输入的Conv1d的梯度。它也被称为分数步卷积或反卷积、去卷积(尽管它不是一个实际的数学上的反卷积操作)。

Parameters：

in_channels (int) – Number of channels in the input image

out_channels (int) – Number of channels produced by the convolution

kernel_size (int or tuple) – Size of the convolving kernel

stride (int or tuple, optional) – Stride of the convolution. Default: 1 控制交叉相关的步幅。

padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of the input. Default: 0 控制两侧的隐式零填充量dilation * (kernel_size - 1) - padding。有关详细信息，请参见下面的注释。

output_padding (int or tuple, optional) – Additional size added to one side of the output shape. Default: 0 控制添加到输出形状一侧的附加大小。详情请参阅下面的说明。

groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 控制输入和输出之间的连接。 in_channels和out_channels必须都可以被groups整除。

bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does. 控制卷积核核点之间的间隔，也被称为à trous算法。这很难描述，但是这个链接很好地展示了膨胀的作用。

注意：

Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding.

根据卷积核的大小，输入的（最后一列）几列可能会丢失，因为它是有效的互相关，而不是完整的互相关。用户可以自行添加适当的填充。

The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sides of the input. This is set so that when a Conv1d and a ConvTranspose1d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when stride > 1, Conv1d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that output_padding is only used to find output shape, but does not actually add zero-padding to output.

padding参数有效地将diation *（kernel_size-1）-padding个零填充的填充量添加到输入的两侧。进行设置是为了使Conv1d和ConvTranspose1d用相同的参数初始化时，它们在输入和输出形状方面彼此相反。但是，当stride> 1时，Conv1d会将多个输入形状映射到相同的输出形状。提供output_padding可以通过有效地增加一侧的计算输出形状来解决这种歧义。请注意，output_padding仅用于查找输出形状，但实际上并未向输出添加零填充。

In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True. Please see the notes on Reproducibility for background.

在某些情况下，将CUDA后端与CuDNN一起使用时，该运算符可能会选择不确定的算法来提高性能。如果不希望这样做，则可以通过将torch.backends.cudnn.deterministic = True设置为确定性操作（可能以性能为代价）。请参阅有关可重现性的说明作为背景。

输入输出形状：

Input: (N, C_in, L_in)

Output: (N,C_o_u_t,L_o_u_t) where

L_o_u_t=(L_i_n−1)×stride−2×padding+dilation×(kernel_size−1)+output_padding+1

Variables

ConvTranspose1d.weight (Tensor)

ConvTranspose1d.bias (Tensor)

`torch.nn.ConvTranspose2d`

torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')

The parameters kernel_size, stride, padding, output_padding can either be:

a single int – in which case the same value is used for the height and width dimensions 高宽两个方向参数相同
a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension分别为高宽两个方向指定参数

Shape:

Input: (N,C_i_n,H_i_n,W_i_n)
Output: (N,C_o_u_t,H_o_u_t,W_o_u_t) where

H_o_u_t=(H_i_n−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1

`torch.nn.ConvTranspose3d`

torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')

ConvTranspose2d原理，深度网络如何进行上采样？

unpooling（上池化）

另一种上采样的方法是，参考代码：segnet_pytorch：

        # Stage 5
        x51 = F.relu(self.bn51(self.conv51(x4p)))
        x52 = F.relu(self.bn52(self.conv52(x51)))
        x53 = F.relu(self.bn53(self.conv53(x52)))
        #这个id5记录的是池化操作时最大值的index，其要设置参数return_indices为True
        x5p, id5 = F.max_pool2d(x53,kernel_size=2, stride=2,return_indices=True)
 
 
        # Stage 5d
        #这个是进行最大值上采样的函数，其是根据id5来把值放到什么位置，其它位置没有值的地方
        #补0
        x5d = F.max_unpool2d(x5p, id5, kernel_size=2, stride=2)
        x53d = F.relu(self.bn53d(self.conv53d(x5d)))
        x52d = F.relu(self.bn52d(self.conv52d(x53d)))
        x51d = F.relu(self.bn51d(self.conv51d(x52d)))

测试：

#测试上采样
m=nn.MaxPool2d((3,3),stride=(1,1),return_indices=True)
upm=nn.MaxUnpool2d((3,3),stride=(1,1))
data4=torch.randn(1,1,3,3)
output5,indices=m(data4)
output6=upm(output5,indices)
 
print('\ndata4:',data4,
      '\nmaxPool2d',output5,
      '\nindices:',indices,
      '\noutput6:',output6)

其输出为：

data4: tensor([[[[ 2.3151, -1.0391,  0.1074],
          [ 1.9360,  0.2524,  2.3735],
          [-0.1151,  0.4684, -1.8800]]]]) 
maxPool2d tensor([[[[2.3735]]]]) 
indices: tensor([[[[5]]]]) 
output6: tensor([[[[0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 2.3735],
          [0.0000, 0.0000, 0.0000]]]])

通过转置卷积或upsample+卷积的方式上采样：

def upconv2x2(in_channels, out_channels, mode='transpose'):
    if mode == 'transpose':
        # 这个上采用需要设置其输入通道，输出通道.其中kernel_size、stride
        # 大小要跟对应下采样设置的值一样大小。这样才可恢复到相同的wh。这里时反卷积 
        # 操作。
        return nn.ConvTranspose2d(
            in_channels,
            out_channels,
            kernel_size=2,
            stride=2)
    else:
        # out_channels is always going to be the same
        # as in_channels
        # 这里不会改变通道数，其中scale_factor是上采用的放大因子，其是相对于当前的
        # 输入大小的倍数
        return nn.Sequential(
            nn.Upsample(mode='bilinear', scale_factor=2, align_corners=True))
        # 这里的代码是在这里设置多一个卷积,这样子就起到了可以修改其输出通道的功能 
        # 了。
        # 相当于功能跟ConvTranspose2d()差不多，只是上采样的方法不同
        conv1x1((in_channels, out_channels))
 
 
def conv1x1(in_channels, out_channels, groups=1):
    return nn.Sequential(nn.Conv2d(
        in_channels,
        out_channels,
        kernel_size=1,
        groups=groups,
        stride=1),
    nn.BatchNorm2d(out_channels))

PixelShuffle

在PyTorch中，上采样的层被封装在torch.nn中的Vision Layers里面，一共有4种：

① PixelShuffle
② Upsample
③ UpsamplingNearest2d
④ UpsamplingBilinear2d

该类定义如下：

class torch.nn.PixleShuffle(upscale_factor)

这里的upscale_factor就是放大的倍数，数据类型为int。

以四维输入(N,C,H,W)为例，Pixelshuffle会将为(∗,r²C,H,W)的Tensor给reshape成(∗,C,rH,rW)的Tensor。形式化地说，它的输入输出的shape如下：

输入：（N，C x upscale_factor²，H，W)

输出：(N，C ，Hx upscale_factor，Wx upscale_factor)

相当于将特征图转化为图片像素。

>>> ps = nn.PixelShuffle(3)
>>> input = torch.tensor(1, 9, 4, 4)
>>> output = ps(input)
>>> print(output.size())
torch.Size([1, 1, 12, 12])

posted on 2020-05-19 20:10 那抹阳光1994 阅读(14530) 评论(0) 编辑收藏举报

刷新页面返回顶部

那抹阳光1994