转置卷积Transposed Convolution

我们为卷积神经网络引入的层，包括卷积层和池层，通常会减小输入的宽度和高度，或者保持不变。然而，语义分割和生成对抗网络等应用程序需要预测每个像素的值，因此需要增加输入宽度和高度。转置卷积，也称为分步卷积或反卷积，就是为了达到这一目的。

from mxnet import np, npx, init

from mxnet.gluon import nn

from d2l import mxnet as d2l

npx.set_np()

1. Basic 2D Transposed Convolution

让我们考虑一个基本情况，输入和输出通道都是1，填充为0，步长为1。图1说明了如何用2×2输入矩阵计算2×2内核的。

Fig. 1. Transposed convolution layer with a 2×22×2 kernel.

可以通过给出矩阵核来实现这个运算 K和矩阵输入X。

def trans_conv(X, K):

h, w = K.shape

Y = np.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))

for i in range(X.shape[0]):

for j in range(X.shape[1]):

Y[i: i + h, j: j + w] += X[i, j] * K

Return

卷积通过Y[i, j] = (X[i: i + h, j: j + w] * K).sum()计算结果，它通过内核汇总输入值。而转置卷积则通过核来传输输入值，从而得到更大的输出。

X = np.array([[0, 1], [2, 3]])

K = np.array([[0, 1], [2, 3]])

trans_conv(X, K)

array([[ 0., 0., 1.],

[ 0., 4., 6.],

[ 4., 12., 9.]])

或者我们可以用nn.Conv2D转置得到同样的结果。作为nn.Conv2D，输入和核都应该是四维张量。

X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)

tconv = nn.Conv2DTranspose(1, kernel_size=2)

tconv.initialize(init.Constant(K))

tconv(X)

array([[[[ 0., 0., 1.],

[ 0., 4., 6.],

[ 4., 12., 9.]]]])

2. Padding, Strides, and Channels

在卷积中，我们将填充元素应用于输入，而在转置卷积中将它们应用于输出。A 1×1 padding意味着我们首先正常计算输出，然后删除第一行/最后一列。

tconv = nn.Conv2DTranspose(1, kernel_size=2, padding=1)

tconv.initialize(init.Constant(K))

tconv(X)

array([[[[4.]]]])

同样，在输出中也应用了这个策略。

tconv = nn.Conv2DTranspose(1, kernel_size=2, strides=2)

tconv.initialize(init.Constant(K))

tconv(X)

array([[[[0., 0., 0., 1.],

[0., 0., 2., 3.],

[0., 2., 0., 3.],

[4., 6., 6., 9.]]]])

X = np.random.uniform(size=(1, 10, 16, 16))

conv = nn.Conv2D(20, kernel_size=5, padding=2, strides=3)

tconv = nn.Conv2DTranspose(10, kernel_size=5, padding=2, strides=3)

conv.initialize()

tconv.initialize()

tconv(conv(X)).shape == X.shape

True

3. Analogy to Matrix Transposition

转置卷积因矩阵转置而得名。实际上，卷积运算也可以通过矩阵乘法来实现。在下面的示例中，我们定义了一个3×3× input XX with a 2×22×2 kernel K，然后使用corr2d计算卷积输出。

X = np.arange(9).reshape(3, 3)

K = np.array([[0, 1], [2, 3]])

Y = d2l.corr2d(X, K)

array([[19., 25.],

[37., 43.]])

Next, we rewrite convolution kernel KK as a matrix WW. Its shape will be (4,9)(4,9), where the ithith row present applying the kernel to the input to generate the ithith output element.

def kernel2matrix(K):

k, W = np.zeros(5), np.zeros((4, 9))

k[:2], k[3:5] = K[0, :], K[1, :]

W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k

return W

W = kernel2matrix(K)

array([[0., 1., 0., 2., 3., 0., 0., 0., 0.],

[0., 0., 1., 0., 2., 3., 0., 0., 0.],

[0., 0., 0., 0., 1., 0., 2., 3., 0.],

[0., 0., 0., 0., 0., 1., 0., 2., 3.]])

然后通过适当的整理，用矩阵乘法实现卷积算子。

Y == np.dot(W, X.reshape(-1)).reshape(2, 2)

array([[ True, True],

[ True, True]])

We can implement transposed convolution as a matrix multiplication as well by reusing kernel2matrix. To reuse the generated WW, we construct a 2×22×2 input, so the corresponding weight matrix will have a shape (9,4)(9,4), which is W⊤W⊤. Let us verify the results.

X = np.array([[0, 1], [2, 3]])

Y = trans_conv(X, K)

Y == np.dot(W.T, X.reshape(-1)).reshape(3, 3)

array([[ True, True, True],

[ True, True, True],

[ True, True, True]])

4. Summary

Compared to convolutions that reduce inputs through kernels, transposed convolutions broadcast inputs.
If a convolution layer reduces the input width and height by nwnw and hhhh time, respectively. Then a transposed convolution layer with the same kernel sizes, padding and strides will increase the input width and height by nwnw and nhnh, respectively.
We can implement convolution operations by the matrix multiplication, the corresponding transposed convolutions can be done by transposed matrix multiplication.