转置卷积Transposed Convolution

转置卷积Transposed Convolution


from mxnet import np, npx, init

from mxnet.gluon import nn

from d2l import mxnet as d2l


1. Basic 2D Transposed Convolution



Fig. 1. Transposed convolution layer with a 2×22×2 kernel.

可以通过给出矩阵核来实现这个运算 K和矩阵输入X。

def trans_conv(X, K):

    h, w = K.shape

    Y = np.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))

    for i in range(X.shape[0]):

        for j in range(X.shape[1]):

            Y[i: i + h, j: j + w] += X[i, j] * K


卷积通过Y[i, j] = (X[i: i + h, j: j + w] * K).sum()计算结果,它通过内核汇总输入值。而转置卷积则通过核来传输输入值,从而得到更大的输出。

X = np.array([[0, 1], [2, 3]])

K = np.array([[0, 1], [2, 3]])

trans_conv(X, K)

array([[ 0.,  0.,  1.],

       [ 0.,  4.,  6.],

       [ 4., 12.,  9.]])


X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)

tconv = nn.Conv2DTranspose(1, kernel_size=2)



array([[[[ 0.,  0.,  1.],

         [ 0.,  4.,  6.],

         [ 4., 12.,  9.]]]])

2. Padding, Strides, and Channels

在卷积中,我们将填充元素应用于输入,而在转置卷积中将它们应用于输出。A 1×1 padding意味着我们首先正常计算输出,然后删除第一行/最后一列。

tconv = nn.Conv2DTranspose(1, kernel_size=2, padding=1)





tconv = nn.Conv2DTranspose(1, kernel_size=2, strides=2)



array([[[[0., 0., 0., 1.],

         [0., 0., 2., 3.],

         [0., 2., 0., 3.],

         [4., 6., 6., 9.]]]])


X = np.random.uniform(size=(1, 10, 16, 16))

conv = nn.Conv2D(20, kernel_size=5, padding=2, strides=3)

tconv = nn.Conv2DTranspose(10, kernel_size=5, padding=2, strides=3)



tconv(conv(X)).shape == X.shape


3. Analogy to Matrix Transposition

转置卷积因矩阵转置而得名。实际上,卷积运算也可以通过矩阵乘法来实现。在下面的示例中,我们定义了一个3×3× input XX with a 2×22×2 kernel K,然后使用corr2d计算卷积输出。

X = np.arange(9).reshape(3, 3)

K = np.array([[0, 1], [2, 3]])

Y = d2l.corr2d(X, K)


array([[19., 25.],

       [37., 43.]])

Next, we rewrite convolution kernel KK as a matrix WW. Its shape will be (4,9)(4,9), where the ithith row present applying the kernel to the input to generate the ithith output element.

def kernel2matrix(K):

    k, W = np.zeros(5), np.zeros((4, 9))

    k[:2], k[3:5] = K[0, :], K[1, :]

    W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k

    return W


W = kernel2matrix(K)


array([[0., 1., 0., 2., 3., 0., 0., 0., 0.],

       [0., 0., 1., 0., 2., 3., 0., 0., 0.],

       [0., 0., 0., 0., 1., 0., 2., 3., 0.],

       [0., 0., 0., 0., 0., 1., 0., 2., 3.]])


Y == np.dot(W, X.reshape(-1)).reshape(2, 2)

array([[ TrueTrue],

       [ TrueTrue]])

We can implement transposed convolution as a matrix multiplication as well by reusing kernel2matrix. To reuse the generated WW, we construct a 2×22×2 input, so the corresponding weight matrix will have a shape (9,4)(9,4), which is W⊤W⊤. Let us verify the results.

X = np.array([[0, 1], [2, 3]])

Y = trans_conv(X, K)

Y == np.dot(W.T, X.reshape(-1)).reshape(3, 3)

array([[ TrueTrueTrue],

       [ TrueTrueTrue],

       [ TrueTrueTrue]])

4. Summary

  • Compared to convolutions that reduce inputs through kernels, transposed convolutions broadcast inputs.
  • If a convolution layer reduces the input width and height by nwnw and hhhh time, respectively. Then a transposed convolution layer with the same kernel sizes, padding and strides will increase the input width and height by nwnw and nhnh, respectively.
  • We can implement convolution operations by the matrix multiplication, the corresponding transposed convolutions can be done by transposed matrix multiplication.


posted @ 2020-06-30 09:25  吴建明wujianming  阅读(788)  评论(0编辑  收藏  举报