DCGAN
1|0Deep Convolutional Generative Adversarial Networks
we introduced the basic ideas behind how GANs work. We showed that they can draw samples from some simple, easy-to-sample distribution, like a uniform or normal distribution, and transform them into samples that appear to match the distribution of some dataset. And while our example of matching a 2D Gaussian distribution got the point across, it is not especially exciting.
In this section, we will demonstrate how you can use GANs to generate photorealistic images. We will be basing our models on the deep convolutional GANs (DCGAN) introduced in :cite:Radford.Metz.Chintala.2015
. We will borrow the convolutional architecture that have proven so successful for discriminative computer vision problems and show how via GANs, they can be leveraged to generate photorealistic images.
1|1The Pokemon Dataset
The dataset we will use is a collection of Pokemon sprites obtained from pokemondb. First download, extract and load this dataset.
We resize each image into . The ToTensor
transformation will project the pixel value into , while our generator will use the tanh function to obtain outputs in . Therefore we normalize the data with mean and standard deviation to match the value range.
Let's visualize the first 20 images.

1|2The Generator
The generator needs to map the noise variable , a length- vector, to a RGB image with width and height to be . In :numref:sec_fcn
we introduced the fully convolutional network that uses transposed convolution layer (refer to :numref:sec_transposed_conv
) to enlarge input size. The basic block of the generator contains a transposed convolution layer followed by the batch normalization and ReLU activation.
In default, the transposed convolution layer uses a kernel, a strides, and a padding. With a input shape of , the generator block will double input's width and height.
If changing the transposed convolution layer to a kernel, strides and zero padding. With a input size of , the output will have its width and height increased by 3 respectively.
The generator consists of four basic blocks that increase input's both width and height from 1 to 32. At the same time, it first projects the latent variable into channels, and then halve the channels each time. At last, a transposed convolution layer is used to generate the output. It further doubles the width and height to match the desired shape, and reduces the channel size to . The tanh activation function is applied to project output values into the range.
Generate a 100 dimensional latent variable to verify the generator's output shape.
1|3Discriminator
The discriminator is a normal convolutional network network except that it uses a leaky ReLU as its activation function. Given , its definition is
As it can be seen, it is normal ReLU if , and an identity function if . For , leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the "dying ReLU" problem that a neuron might always output a negative value and therefore cannot make any progress since the gradient of ReLU is 0.

The basic block of the discriminator is a convolution layer followed by a batch normalization layer and a leaky ReLU activation. The hyper-parameters of the convolution layer are similar to the transpose convolution layer in the generator block.
A basic block with default settings will halve the width and height of the inputs, as we demonstrated in :numref:sec_padding
. For example, given a input shape , with a kernel shape , a stride shape , and a padding shape , the output shape will be:
The discriminator is a mirror of the generator.
It uses a convolution layer with output channel as the last layer to obtain a single prediction value.
1|4Training
Compared to the basic GAN in :numref:sec_basic_gan
, we use the same learning rate for both generator and discriminator since they are similar to each other. In addition, we change in Adam (:numref:sec_adam
) from to . It decreases the smoothness of the momentum, the exponentially weighted moving average of past gradients, to take care of the rapid changing gradients because the generator and the discriminator fight with each other. Besides, the random generated noise Z
, is a 4-D tensor and we are using GPU to accelerate the computation.
Now let's train the model.
1|5Summary
- DCGAN architecture has four convolutional layers for the Discriminator and four "fractionally-strided" convolutional layers for the Generator.
- The Discriminator is a 4-layer strided convolutions with batch normalization (except its input layer) and leaky ReLU activations.
- Leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the “dying ReLU” problem and helps the gradients flow easier through the architecture.
1|6Exercises
- What will happen if we use standard ReLU activation rather than leaky ReLU?
- Apply DCGAN on Fashion-MNIST and see which category works well and which does not.
__EOF__
作 者:Hichens
出 处:https://www.cnblogs.com/hichens/p/12355031.html
关于博主:莫得感情的浅度学习机器人
版权声明:@Hichens
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角【推荐】一下。您的鼓励是博主的最大动力!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人