[Converge] Weight Initialiser

From: http://www.cnblogs.com/denny402/p/6932956.html

 

[784, 10] fully connected

w = tf.Variable(tf.truncated_normal([img_pixel_input, layersize], mean=0.0, stddev=1.0, dtype=tf.float32))
b = tf.Variable(tf.truncated_normal([layersize                 ], mean=0.0, stddev=1.0, dtype=tf.float32))

Epoch 0, Training Loss: 2.10244838786, Test accuracy: 0.514423076923, time: 1.95s, total time: 2.8s
Epoch 1, Training Loss: 1.86659669154, Test accuracy: 0.640424679487, time: 1.54s, total time: 5.12s
Epoch 2, Training Loss: 1.80024383674, Test accuracy: 0.680989583333, time: 1.49s, total time: 7.47s
Epoch 3, Training Loss: 1.77303568244, Test accuracy: 0.699318910256, time: 1.53s, total time: 9.63s
Epoch 4, Training Loss: 1.75938568276, Test accuracy: 0.712740384615, time: 1.4s, total time: 11.81s
Epoch 5, Training Loss: 1.74897368638, Test accuracy: 0.718449519231, time: 1.57s, total time: 14.27s
Epoch 6, Training Loss: 1.7434025914, Test accuracy: 0.722355769231, time: 1.37s, total time: 16.52s
Epoch 7, Training Loss: 1.71330288407, Test accuracy: 0.792668269231, time: 1.37s, total time: 18.75s
Epoch 8, Training Loss: 1.66116618999, Test accuracy: 0.850560897436, time: 1.51s, total time: 21.08s
Epoch 9, Training Loss: 1.600759656, Test accuracy: 0.88030849359, time: 1.46s, total time: 23.32s
Epoch 10, Training Loss: 1.58312522976, Test accuracy: 0.892327724359, time: 1.52s, total time: 25.63s
Epoch 11, Training Loss: 1.5736670608, Test accuracy: 0.896534455128, time: 1.54s, total time: 28.01s
Epoch 12, Training Loss: 1.56778478539, Test accuracy: 0.905749198718, time: 1.47s, total time: 30.37s
Epoch 13, Training Loss: 1.56342586715, Test accuracy: 0.905548878205, time: 1.52s, total time: 32.71s
Epoch 14, Training Loss: 1.55950221926, Test accuracy: 0.906049679487, time: 1.54s, total time: 35.06s
Epoch 15, Training Loss: 1.55725609423, Test accuracy: 0.910356570513, time: 1.49s, total time: 37.45s
Epoch 16, Training Loss: 1.55490833146, Test accuracy: 0.911959134615, time: 1.58s, total time: 39.89s
Epoch 17, Training Loss: 1.55294992346, Test accuracy: 0.913561698718, time: 1.56s, total time: 42.26s
Epoch 18, Training Loss: 1.55085181106, Test accuracy: 0.916967147436, time: 1.51s, total time: 44.63s
Epoch 19, Training Loss: 1.54926108397, Test accuracy: 0.911858974359, time: 1.52s, total time: 47.0s
Total training time: 47.0s


w = tf.Variable(tf.truncated_normal([img_pixel_input, layersize], mean=0.01, stddev=1.0, dtype=tf.float32))
b = tf.Variable(tf.truncated_normal([layersize                 ], mean=0.01, stddev=1.0, dtype=tf.float32))

Epoch 0, Training Loss: 2.1900485101, Test accuracy: 0.443008814103, time: 1.84s, total time: 2.75s
Epoch 1, Training Loss: 1.93756918807, Test accuracy: 0.599659455128, time: 1.48s, total time: 5.1s
Epoch 2, Training Loss: 1.84595911986, Test accuracy: 0.653145032051, time: 1.47s, total time: 7.35s
Epoch 3, Training Loss: 1.8073041603, Test accuracy: 0.682291666667, time: 1.49s, total time: 9.63s
Epoch 4, Training Loss: 1.78734811036, Test accuracy: 0.688601762821, time: 1.43s, total time: 11.86s
Epoch 5, Training Loss: 1.7739427098, Test accuracy: 0.700520833333, time: 1.43s, total time: 14.02s
Epoch 6, Training Loss: 1.76551306776, Test accuracy: 0.711738782051, time: 1.34s, total time: 16.08s
Epoch 7, Training Loss: 1.74105782025, Test accuracy: 0.794771634615, time: 1.47s, total time: 18.33s
Epoch 8, Training Loss: 1.67201814229, Test accuracy: 0.808894230769, time: 1.53s, total time: 20.68s
Epoch 9, Training Loss: 1.66241001194, Test accuracy: 0.811698717949, time: 1.49s, total time: 23.0s
Epoch 10, Training Loss: 1.65713534489, Test accuracy: 0.814202724359, time: 1.54s, total time: 25.35s
Epoch 11, Training Loss: 1.65359901187, Test accuracy: 0.820713141026, time: 1.58s, total time: 27.73s
Epoch 12, Training Loss: 1.6501801603, Test accuracy: 0.820012019231, time: 1.49s, total time: 30.08s
Epoch 13, Training Loss: 1.64807084891, Test accuracy: 0.821915064103, time: 1.5s, total time: 32.41s
Epoch 14, Training Loss: 1.64611155364, Test accuracy: 0.821314102564, time: 1.54s, total time: 34.79s
Epoch 15, Training Loss: 1.62634825317, Test accuracy: 0.899539262821, time: 1.51s, total time: 37.05s
Epoch 16, Training Loss: 1.56398414065, Test accuracy: 0.909755608974, time: 1.41s, total time: 39.26s
Epoch 17, Training Loss: 1.55725724714, Test accuracy: 0.912459935897, time: 1.51s, total time: 41.57s
Epoch 18, Training Loss: 1.55478919553, Test accuracy: 0.91796875, time: 1.55s, total time: 43.95s
Epoch 19, Training Loss: 1.55242318568, Test accuracy: 0.917367788462, time: 1.5s, total time: 46.25s
Total training time: 46.25s


w = tf.Variable(tf.truncated_normal([img_pixel_input, layersize], mean=0.01, stddev=5.0, dtype=tf.float32))
b = tf.Variable(tf.truncated_normal([layersize                 ], mean=0.01, stddev=1.0, dtype=tf.float32))

Epoch 0, Training Loss: 2.39008372369, Test accuracy: 0.0950520833333, time: 1.94s, total time: 2.65s
Epoch 1, Training Loss: 2.33227054167, Test accuracy: 0.153245192308, time: 1.54s, total time: 4.96s
Epoch 2, Training Loss: 2.28677356104, Test accuracy: 0.186498397436, time: 1.42s, total time: 7.25s
Epoch 3, Training Loss: 2.23217486891, Test accuracy: 0.269831730769, time: 1.38s, total time: 9.4s
Epoch 4, Training Loss: 2.13864973875, Test accuracy: 0.351061698718, time: 1.47s, total time: 11.65s
Epoch 5, Training Loss: 2.07637035874, Test accuracy: 0.401041666667, time: 1.58s, total time: 14.06s
Epoch 6, Training Loss: 2.04344919623, Test accuracy: 0.426582532051, time: 1.46s, total time: 16.29s
Epoch 7, Training Loss: 2.02300423842, Test accuracy: 0.44140625, time: 1.52s, total time: 18.58s
Epoch 8, Training Loss: 2.00804452852, Test accuracy: 0.455428685897, time: 1.45s, total time: 20.83s
Epoch 9, Training Loss: 1.99567352781, Test accuracy: 0.468549679487, time: 1.51s, total time: 23.19s
Epoch 10, Training Loss: 1.98683612969, Test accuracy: 0.476462339744, time: 1.59s, total time: 25.59s
Epoch 11, Training Loss: 1.980189987, Test accuracy: 0.485677083333, time: 1.57s, total time: 27.98s
Epoch 12, Training Loss: 1.97373542863, Test accuracy: 0.491185897436, time: 1.52s, total time: 30.29s
Epoch 13, Training Loss: 1.967556376, Test accuracy: 0.491887019231, time: 1.61s, total time: 32.7s
Epoch 14, Training Loss: 1.96045698958, Test accuracy: 0.497395833333, time: 1.49s, total time: 34.97s
Epoch 15, Training Loss: 1.95221617978, Test accuracy: 0.517528044872, time: 1.49s, total time: 37.18s
Epoch 16, Training Loss: 1.93845896371, Test accuracy: 0.521534455128, time: 1.46s, total time: 39.35s
Epoch 17, Training Loss: 1.92538965999, Test accuracy: 0.539963942308, time: 1.43s, total time: 41.5s
Epoch 18, Training Loss: 1.91551751801, Test accuracy: 0.546173878205, time: 1.43s, total time: 43.77s
Epoch 19, Training Loss: 1.90569505908, Test accuracy: 0.555989583333, time: 1.47s, total time: 46.05s
Total training time: 46.05s
[784, 10]

 

小方差是首选idea! - 对比可见效果好了很多。

w = tf.Variable(tf.truncated_normal([img_pixel_input, layersize], mean=0.0, stddev=0.01, dtype=tf.float32))
b = tf.Variable(tf.truncated_normal([layersize                 ], mean=0.0, stddev=1.00, dtype=tf.float32))
Epoch 0, Training Loss: 1.75807115331, Test accuracy: 0.895833333333, time: 1.84s, total time: 2.66s
Epoch 1, Training Loss: 1.60653405506, Test accuracy: 0.909855769231, time: 1.56s, total time: 5.02s
Epoch 2, Training Loss: 1.58358776375, Test accuracy: 0.913661858974, time: 1.55s, total time: 7.37s
Epoch 3, Training Loss: 1.57199550759, Test accuracy: 0.918669871795, time: 1.53s, total time: 9.69s
Epoch 4, Training Loss: 1.56478386464, Test accuracy: 0.921173878205, time: 1.49s, total time: 12.04s
Epoch 5, Training Loss: 1.55968606111, Test accuracy: 0.920773237179, time: 1.47s, total time: 14.36s
Epoch 6, Training Loss: 1.55553424692, Test accuracy: 0.923177083333, time: 1.47s, total time: 16.72s
Epoch 7, Training Loss: 1.55266008566, Test accuracy: 0.926181891026, time: 1.5s, total time: 18.94s
Epoch 8, Training Loss: 1.54992543289, Test accuracy: 0.92578125, time: 1.62s, total time: 21.36s
Epoch 9, Training Loss: 1.54779823315, Test accuracy: 0.928886217949, time: 1.59s, total time: 23.78s
Epoch 10, Training Loss: 1.5458223992, Test accuracy: 0.928084935897, time: 1.59s, total time: 26.26s
Epoch 11, Training Loss: 1.5444233951, Test accuracy: 0.925681089744, time: 1.61s, total time: 28.69s
Epoch 12, Training Loss: 1.54265678151, Test accuracy: 0.928886217949, time: 1.28s, total time: 30.67s
Epoch 13, Training Loss: 1.54120515999, Test accuracy: 0.929186698718, time: 1.33s, total time: 32.76s
Epoch 14, Training Loss: 1.54076098256, Test accuracy: 0.930889423077, time: 1.4s, total time: 34.99s
Epoch 15, Training Loss: 1.53926875958, Test accuracy: 0.928685897436, time: 1.54s, total time: 37.34s
Epoch 16, Training Loss: 1.53855588386, Test accuracy: 0.931490384615, time: 1.35s, total time: 39.48s
Epoch 17, Training Loss: 1.53713625878, Test accuracy: 0.932091346154, time: 1.5s, total time: 41.76s
Epoch 18, Training Loss: 1.53662548226, Test accuracy: 0.932892628205, time: 1.53s, total time: 44.14s
Epoch 19, Training Loss: 1.53609782221, Test accuracy: 0.930989583333, time: 1.49s, total time: 46.45s
Total training time: 46.45s

 

 

CNN中最重要的就是参数了,包括W,b。 我们训练CNN的最终目的就是得到最好的参数,使得目标函数取得最小值。参数的初始化也同样重要,因此微调受到很多人的重视,那么tf提供了哪些初始化参数的方法呢,我们能不能自己进行初始化呢?

所有的初始化方法都定义在tensorflow/python/ops/init_ops.py

 

1、tf.constant_initializer()

也可以简写为tf.Constant()

初始化为常数,这个非常有用,通常偏置项就是用它初始化的。

由它衍生出的两个初始化方法:

  • tf.zeros_initializer(),也可以简写为tf.Zeros()
  • tf.ones_initializer(),也可以简写为tf.Ones()

例:在卷积层中,将偏置项b初始化为0,则有多种写法:

复制代码
conv1 = tf.layers.conv2d(batch_images, 
                         filters=64,
                         kernel_size=7,
                         strides=2,
                         activation=tf.nn.relu,
                         kernel_initializer=tf.TruncatedNormal(stddev=0.01)
                         bias_initializer=tf.Constant(0),
                        )
复制代码

或者:

bias_initializer=tf.constant_initializer(0)

或者:

bias_initializer=tf.zeros_initializer()

或者:

bias_initializer=tf.Zeros()

 

例:如何将W初始化成拉普拉斯算子?

value = [1, 1, 1, 1, -8, 1, 1, 1,1]
init = tf.constant_initializer(value)
W= tf.get_variable('W', shape=[3, 3], initializer=init)

 

2、tf.truncated_normal_initializer()

或者简写为tf.TruncatedNormal()

生成截断正态分布的随机数,这个初始化方法好像在tf中用得比较多。

它有四个参数(mean=0.0, stddev=1.0, seed=None, dtype=dtypes.float32),分别用于指定均值、标准差、随机数种子和随机数的数据类型,一般只需要设置stddev这一个参数就可以了。

例:

复制代码
conv1 = tf.layers.conv2d(batch_images, 
                         filters=64,
                         kernel_size=7,
                         strides=2,
                         activation=tf.nn.relu,
                         kernel_initializer=tf.TruncatedNormal(stddev=0.01)
                         bias_initializer=tf.Constant(0),
                        )
复制代码

或者:

复制代码
conv1 = tf.layers.conv2d(batch_images, 
                         filters=64,
                         kernel_size=7,
                         strides=2,
                         activation=tf.nn.relu,
                         kernel_initializer=tf.truncated_normal_initializer(stddev=0.01)
                         bias_initializer=tf.zero_initializer(),
                        )
复制代码

 

3、tf.random_normal_initializer()

可简写为 tf.RandomNormal()

生成标准正态分布的随机数,参数和truncated_normal_initializer一样。

 

4、random_uniform_initializer = RandomUniform()

可简写为tf.RandomUniform()

生成均匀分布的随机数,参数有四个(minval=0, maxval=None, seed=None, dtype=dtypes.float32),分别用于指定最小值,最大值,随机数种子和类型。

 

5、tf.uniform_unit_scaling_initializer()

可简写为tf.UniformUnitScaling()

和均匀分布差不多,只是这个初始化方法不需要指定最小最大值,是通过计算出来的。参数为(factor=1.0, seed=None, dtype=dtypes.float32)

max_val = math.sqrt(3 / input_size) * factor

这里的input_size是指输入数据的维数,假设输入为x, 运算为x * W,则input_size= W.shape[0]

它的分布区间为[ -max_val, max_val]

 

6、tf.variance_scaling_initializer()

可简写为tf.VarianceScaling()

参数为(scale=1.0,mode="fan_in",distribution="normal",seed=None,dtype=dtypes.float32)

scale: 缩放尺度(正浮点数)
mode : "fan_in", "fan_out", "fan_avg"中的一个,用于计算标准差stddev的值。
distribution:分布类型,"normal"或“uniform"中的一个。
当 distribution="normal" 的时候,生成truncated normal   distribution(截断正态分布) 的随机数,其中stddev = sqrt(scale / n) ,n的计算与mode参数有关。
      如果mode = "fan_in", n为输入单元的结点数;         
      如果mode = "fan_out",n为输出单元的结点数;
      如果mode = "fan_avg",n为输入和输出单元结点数的平均值。
当distribution="uniform”的时候 ,生成均匀分布的随机数,假设分布区间为[-limit, limit],则
      limit = sqrt(3 * scale / n)

 

7、tf.orthogonal_initializer()

简写为tf.Orthogonal()

生成正交矩阵的随机数。

当需要生成的参数是2维时,这个正交矩阵是由均匀分布的随机数矩阵经过SVD分解而来。

 

8、tf.glorot_uniform_initializer()

也称之为Xavier uniform initializer,由一个均匀分布(uniform distribution)来初始化数据。

假设均匀分布的区间是[-limit, limit],则

limit=sqrt(6 / (fan_in + fan_out))

其中的fan_in和fan_out分别表示输入单元的结点数和输出单元的结点数。

 

9、glorot_normal_initializer()

也称之为 Xavier normal initializer. 由一个 truncated normal distribution来初始化数据.

stddev = sqrt(2 / (fan_in + fan_out))

其中的fan_in和fan_out分别表示输入单元的结点数和输出单元的结点数。

 


 

权重的可视化调试(有必要复现实验效果)

[Converge] Training Neural Networks

 

复现效果:

ing...

 

posted @ 2017-08-23 19:23  郝壹贰叁  阅读(217)  评论(0编辑  收藏  举报