李宏毅《机器学习》总结 - 2022 HW6（GAN、WGAN-GP） Strong Baseline

这个作业因为是在台大自己的 OJ 上交，因此没法看到评分了，不过把 strong baseline 所要求的的 weight clipping 和 WGAN-GP 都实作了一下，效果确实比一开始要好。。

Simple：
没有人型，不放了

Medium：

Strong：

（比 medium 的人型还是要多的 233）

代码：https://www.kaggle.com/skyrainwind/hw6-gan

题目分析

理论上过 strong baseline 需要实现 WGAN（weight clipping 或 gradient penalty），于是调一下 loss_G 和 loss_D 的公式，改一下 discriminator 的构成，如果是 GP 的话还需要再写（抄）一个损失函数

代码分析

找相似图片
因为随机图库中无意间发现了 lb 中小叶子的图片，但是不知道具体是那一张图片，就学习了一下查询相似图片的代码。
DCGAN
一开始采用的是 DCGAN 的模式，实际上个人理解 DCGAN 和 WGAN 在 generator（G）和 discriminator（D）的架构上并没有什么不同，都是使用卷积层/反卷积层来得到图片/向量。具体的：
G 是将向量转化为图片（这个向量可以是随机给定的，因为我们要产生不同的图片）。其示意图如下：

可以发现，我们需要将图片的大小不断扩大，channel 不断缩小（最后为 3，因为 RGB），这里可以使用 ConvTranspose2d 反卷积层。直观的看，Conv2d 通常情况下是会把一个大的图片变小（因为 kernel 的存在。除非 padding 特别大，但是这一般不会出现），而 ConvTranspose2d 类似于做了卷积层的逆运算，即：在一样的参数下，\((a,a)\rightarrow (b,b) \rightarrow (a,a)\)，分别进行了一次卷积，一次反卷积。
在每一次反卷积的过程中，一般会让 channel 越来越小（因为一开始可以认为图中 channel=100，最后只需 channel=3）
与 G 正好相反，D 采用的是卷积层，会缩小图片（对于 DCGAN 而言，D 得到的是图片的分数 0~1，是一个数字，这正是 sigmoid 所限制的）
另外，关于 Conv2d 的传入参数：传进去的是一个矩阵：\(batchsize\times dim\times H\times W\)，所以 input_dim 应该为 dim，而 output_dim 是结果的 dim，由于 kernel 和 padding 的存在会使图片的大小（\(H\times W\) 发生变化）

G 和 D 的代码如下：

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

class Generator(nn.Module):
    """
    Input shape: (N, in_dim)
    Output shape: (N, 3, 64, 64)
    """
    def __init__(self, in_dim, dim=64):
        super(Generator, self).__init__()
        def dconv_bn_relu(in_dim, out_dim):
            return nn.Sequential(
                nn.ConvTranspose2d(in_dim, out_dim, 5, 2,
                                   padding=2, output_padding=1, bias=False),
                nn.BatchNorm2d(out_dim),
                nn.ReLU()
            )
        self.l1 = nn.Sequential(
            nn.Linear(in_dim, dim * 8 * 4 * 4, bias=False),
            nn.BatchNorm1d(dim * 8 * 4 * 4),
            nn.ReLU()
        )
        self.l2_5 = nn.Sequential(
            dconv_bn_relu(dim * 8, dim * 4),
            dconv_bn_relu(dim * 4, dim * 2),
            dconv_bn_relu(dim * 2, dim),
            nn.ConvTranspose2d(dim, 3, 5, 2, padding=2, output_padding=1),
            nn.Tanh()
        )
        self.apply(weights_init)

    def forward(self, x):
        y = self.l1(x)
        y = y.view(y.size(0), -1, 4, 4)
        y = self.l2_5(y)
        return y


class Discriminator(nn.Module):
    """
    Input shape: (N, 3, 64, 64)
    Output shape: (N, )
    """
    def __init__(self, in_dim, dim=64):
        super(Discriminator, self).__init__()

        def conv_bn_lrelu(in_dim, out_dim):
            return nn.Sequential(
                nn.Conv2d(in_dim, out_dim, 5, 2, 2),
                nn.BatchNorm2d(out_dim),
#                nn.InstanceNorm2d(out_dim),
                nn.LeakyReLU(0.2),
            )

        """ Medium: Remove the last sigmoid layer for WGAN. """
        self.ls = nn.Sequential(
            nn.Conv2d(in_dim, dim, 5, 2, 2),
            nn.LeakyReLU(0.2),
            conv_bn_lrelu(dim, dim * 2),
            conv_bn_lrelu(dim * 2, dim * 4),
            conv_bn_lrelu(dim * 4, dim * 8),
            nn.Conv2d(dim * 8, 1, 4),
            nn.Sigmoid(),
        )
        self.apply(weights_init)

    def forward(self, x):
        y = self.ls(x)
        y = y.view(-1)
        return y

对于训练的过程而言，其实也没有太大差别，就是每个 epoch 都训练 D，每隔几个 epoch 训练 G。分别计算 loss_D 和 loss_G，并反向传播、optimizer 再更新

而对于 WGAN 而言，D 给出的应该是 \(V(G,D)=\min (E_{y\sim G}E(D(y))-E_{y\sim data}E(D(y)))\)，因此应该去掉 D 最后一层的 sigmoid 函数。此外，还应该去掉 batchnorm（？）。计算期望的话，可以在 G 和 data 中采样，将所得到的值做平均，就可以得到了期望的近似值。
此外，对于 weight clipping 而言，在 D 最后还应该加一个 clipping 的操作

        for p in D.parameters():
            p.data.clamp_(-clip_value, clip_value)

而如果采用的是 GP，应该在 \(V(G,D)\) 之后再加一个 gradient penalty 的惩罚项，可参考自 https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/wgan_gp/wgan_gp.py。具体来说可以求出 \(D(y)\) 对于 \(y\) 的梯度，由于 \(y\) 是随机选择的很多个，因此这个梯度会有 norm（均值）\(N\)，施加一个惩罚项为 \((N-1)^2\)，代表应该为 1 附近的梯度。来限制 \(V\)

~~不小心按错了导致重写了一遍，吐了~~

posted @ 2024-02-14 01:34 SkyRainWind 阅读(157) 评论(0) 编辑收藏举报

刷新页面返回顶部

SkyRainWind

空を見ろ。空を見続けろ。答えはそこにある。

李宏毅《机器学习》总结 - 2022 HW6（GAN、WGAN-GP） Strong Baseline

题目分析

代码分析

公告