使用tf.nn.batch_normalization函数实现Batch Normalization操作
觉得有用的话,欢迎一起讨论相互学习~




参考文献
吴恩达deeplearningai课程
课程笔记
Udacity课程
"""
大多数情况下,您将能够使用高级功能,但有时您可能想要在较低的级别工作。例如,如果您想要实现一个新特性—一些新的内容,那么TensorFlow还没有包括它的高级实现,
比如LSTM中的批处理规范化——那么您可能需要知道一些事情。
这个版本的网络的几乎所有函数都使用tf.nn包进行编写,并且使用tf.nn.batch_normalization函数进行标准化操作
'fully_connected'函数的实现比使用tf.layers包进行编写的要复杂得多。然而,如果你浏览了Batch_Normalization_Lesson笔记本,事情看起来应该很熟悉。
为了增加批量标准化,我们做了如下工作:
Added the is_training parameter to the function signature so we can pass that information to the batch normalization layer.
1.在函数声明中添加'is_training'参数,以确保可以向Batch Normalization层中传递信息
2.去除函数中bias偏置属性和激活函数
3.添加gamma, beta, pop_mean, and pop_variance等变量
4.使用tf.cond函数来解决训练和预测时的使用方法的差异
5.训练时,我们使用tf.nn.moments函数来计算批数据的均值和方差,然后在迭代过程中更新均值和方差的分布,并且使用tf.nn.batch_normalization做标准化
注意:一定要使用with tf.control_dependencies...语句结构块来强迫Tensorflow先更新均值和方差的分布,再使用执行批标准化操作
6.在前向传播推导时(特指只进行预测,而不对训练参数进行更新时),我们使用tf.nn.batch_normalization批标准化时其中的均值和方差分布来自于训练时我们
使用滑动平均算法估计的值。
7.将标准化后的值通过RelU激活函数求得输出
8.不懂请参见https://github.com/udacity/deep-learning/blob/master/batch-norm/Batch_Normalization_Lesson.ipynb
中关于使用tf.nn.batch_normalization实现'fully_connected'函数的操作
"""
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)
def fully_connected(prev_layer, num_units, is_training):
"""
num_units参数传递该层神经元的数量,根据prev_layer参数传入值作为该层输入创建全连接神经网络。
:param prev_layer: Tensor
该层神经元输入
:param num_units: int
该层神经元结点个数
:param is_training: bool or Tensor
表示该网络当前是否正在训练,告知Batch Normalization层是否应该更新或者使用均值或方差的分布信息
:returns Tensor
一个新的全连接神经网络层
"""
layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
gamma = tf.Variable(tf.ones([num_units]))
beta = tf.Variable(tf.zeros([num_units]))
pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)
epsilon = 1e-3
def batch_norm_training():
batch_mean, batch_variance = tf.nn.moments(layer, [0])
decay = 0.99
train_mean = tf.assign(pop_mean, pop_mean*decay + batch_mean*(1 - decay))
train_variance = tf.assign(pop_variance, pop_variance*decay + batch_variance*(1 - decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)
batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
return tf.nn.relu(batch_normalized_output)
"""
我们对conv_layer卷积层的改变和我们对fully_connected全连接层的改变几乎差不多。
然而也有很大的区别,卷积层有多个特征图并且每个特征图在输入图层上共享权重
所以我们需要确保应该针对每个特征图而不是卷积层上的每个节点进行Batch Normalization操作
为了实现这一点,我们做了与fully_connected相同的事情,有两个例外:
1.将gamma、beta、pop_mean和pop_方差的大小设置为feature map(输出通道)的数量,而不是输出节点的数量。
2.我们改变传递给tf.nn的参数。时刻确保它计算正确维度的均值和方差。
"""
def conv_layer(prev_layer, layer_depth, is_training):
"""
使用给定的参数作为输入创建卷积层
:param prev_layer: Tensor
传入该层神经元作为输入
:param layer_depth: int
我们将根据网络中图层的深度设置特征图的步长和数量。
这不是实践CNN的好方法,但它可以帮助我们用很少的代码创建这个示例。
:param is_training: bool or Tensor
表示该网络当前是否正在训练,告知Batch Normalization层是否应该更新或者使用均值或方差的分布信息
:returns Tensor
一个新的卷积层
"""
strides = 2 if layer_depth%3 == 0 else 1
in_channels = prev_layer.get_shape().as_list()[3]
out_channels = layer_depth*4
weights = tf.Variable(
tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
layer = tf.nn.conv2d(prev_layer, weights, strides=[1, strides, strides, 1], padding='SAME')
gamma = tf.Variable(tf.ones([out_channels]))
beta = tf.Variable(tf.zeros([out_channels]))
pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)
pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)
epsilon = 1e-3
def batch_norm_training():
batch_mean, batch_variance = tf.nn.moments(layer, [0, 1, 2], keep_dims=False)
decay = 0.99
train_mean = tf.assign(pop_mean, pop_mean*decay + batch_mean*(1 - decay))
train_variance = tf.assign(pop_variance, pop_variance*decay + batch_variance*(1 - decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)
batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
return tf.nn.relu(batch_normalized_output)
"""
为了修改训练函数,我们需要做以下工作:
1.Added is_training, a placeholder to store a boolean value indicating whether or not the network is training.
添加is_training,一个用于存储布尔值的占位符,该值指示网络是否正在训练
2.Each time we call run on the session, we added to feed_dict the appropriate value for is_training.
每次调用sess.run函数时,我们都添加到feed_dict中is_training的适当值用以表示当前是正在训练还是预测
3.We did not need to add the with tf.control_dependencies... statement that we added in the network that used tf.layers.batch_normalization
because we handled updating the population statistics ourselves in conv_layer and fully_connected.
我们不需要将train_opt训练函数放进with tf.control_dependencies... 的函数结构体中,这是只有在使用tf.layers.batch_normalization才做的更新均值和方差的操作
"""
def train(num_batches, batch_size, learning_rate):
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool)
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1]*orig_shape[2]*orig_shape[3]])
layer = fully_connected(layer, 100, is_training)
logits = tf.layers.dense(layer, 10)
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
if batch_i%100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print(
'Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i%25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels,
is_training: False})
print('Final test accuracy: {:>3.5f}'.format(acc))
correct = 0
for i in range(100):
correct += sess.run(accuracy, feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]],
is_training: False})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
"""
再一次,批量标准化的模型很快达到了很高的精度。
但是在我们的运行中,注意到它似乎并没有学习到前250个批次的任何东西,然后精度开始上升。
这只是显示——即使是批处理标准化,给您的网络一些时间来学习是很重要的。
PS:再100个单个数据的预测上达到了较高的精度,而这才是BN算法真正关注的!!
"""
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)