6、前向传播的简单实现
1、通过张量的矩阵运算实现前向传播,也可以通过relu的层方式进行实现,前向传播是将非线性层relu(wx+b)进行串联增加复杂度,然后计算分类误差得到合适的参数,
(1)导入模块
1 import tensorflow as tf
2 from tensorflow import keras #keras是深度集成的一个包
3 from tensorflow.keras import datasets #keras提供的可以管理数据集的一个工具,不需要人为的下载数据集
(2)用工具创建模拟的数据
datasets是一个管理数据的工具,如果在检测到系统内没有mnist数据集,它会自动的进行下载
1 (x,y), _ = datasets.mnist.load_data()
# x.shape:[20k,28,28] 20k:200000
# y.shape:[60k] 60k:600000
输出:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 8192/11490434 [..............................] - ETA: 0s 32768/11490434 [..............................] - ETA: 22s 73728/11490434 [..............................] - ETA: 18s 147456/11490434 [..............................] - ETA: 12s 278528/11490434 [..............................] - ETA: 8s 425984/11490434 [>.............................] - ETA: 7s 802816/11490434 [=>............................] - ETA: 4s 1409024/11490434 [==>...........................] - ETA: 2s 2260992/11490434 [====>.........................] - ETA: 1s 3047424/11490434 [======>.......................] - ETA: 1s 3506176/11490434 [========>.....................] - ETA: 1s 4276224/11490434 [==========>...................] - ETA: 1s 5496832/11490434 [=============>................] - ETA: 0s 6160384/11490434 [===============>..............] - ETA: 0s 6971392/11490434 [=================>............] - ETA: 0s 7012352/11490434 [=================>............] - ETA: 0s 7176192/11490434 [=================>............] - ETA: 0s 8273920/11490434 [====================>.........] - ETA: 0s 9601024/11490434 [========================>.....] - ETA: 0s 10207232/11490434 [=========================>....] - ETA: 0s 10764288/11490434 [===========================>..] - ETA: 0s 10944512/11490434 [===========================>..] - ETA: 0s 11493376/11490434 [==============================] - 1s 0us/step
(3)将数据转换为tensor,查看数据的范围(即最大值和最小值)
1 x = tf.convert_to_tensor(x, dtype=tf.float32)/255 #知道最大值之后对x/255进行归一化处理
2
3 y = tf.convert_to_tensor(y, dtype=tf.int32)
4 print(tf.reduce_min(x),tf.reduce_max(x))
5 print(tf.reduce_min(y),tf.reduce_max(y))
输出:
tf.Tensor(0.0, shape=(), dtype=float32) tf.Tensor(255.0, shape=(), dtype=float32) #x的最小值0和最大值255
tf.Tensor(0, shape=(), dtype=int32) tf.Tensor(9, shape=(), dtype=int32) #y的最小值0和最大值9,十种类型
(4)从模拟的数据中挑出一部分创建用于训练的数据集组成训练集
1 train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128)/255 #切片128个批量,归一化
2 train_iter = iter(train_db) #迭代器
3 sample = next(train_iter) #每次取的是一个batch128
4 print("batch: ",sample[0].shape,sample[1].shape) # sample[0]表示x,sample[1]表示y,batch: x: (128, 28, 28) y: (128,)
(5)随机初始化w和b,使用裁剪过的0-1分布
1 #降维过程 , b = 128
2 #[b,784] =>[b,256] =>[b,128] =>[b,10] 中间的数值是随便定,最终输出的数值是根据有多少类决定,这里有10类
3 # w的shape:[dim_in, dim_out], b的shape:[dim_out]
4
5 #要将变量设置为tf.Variable类型才会进行更新
6 w1 = tf.Variable(tf.random.truncated_normal([784,256],stddev=0.1)) # 28*28=784 随机初始化w1,a
7 b1 = tf.Variable(tf.zeros([256]))
8 w2 = tf.Variable(tf.random.truncated_normal([256,128],stddev=0.1))
9 b2 = tf.Variable(tf.zeros([128]))
10 w3 = tf.Variable(tf.random.truncated_normal([128,10],stddev=0.1))
11 b3 = tf.Variable(tf.zeros([10])) #已经创建好了三个tensor,然后进行前向运算
12 lr = 1e-3 #误差计算,0.001
(6)进行前向运算,得到运算结果out,即y的预测值
1 # x.shape:[128,28,28] 2 # y.shape:[128] 3 4 #希望x.shape为[b,28*28],所以先进行维度变换 5 x = tf.reshape(x, [-1,28*28]) # -1会自己进行计算 6 # x.shape:[b,28*28] 7 8 # h1 = x@w1 + b1 这里自动进行broadcast,我们也可以自己进行broadcast_to 9 # 即 h1 = [b,784] @ [784,256] + [256] => [b,256] +[256] => [b,256] + [b,256] = [b,256] 10 h1 = x@w1 + tf.broadcast_to(b1,[x.shape[0],256]) 11 h1 = tf.nn.relu(h1) #非线性转换 12 13 #[b, 256] => [b, 128] 14 h2 = h1@w2 + b2 15 h2 = tf.nn.relu(h2) # 非线性转换 16 17 #[b, 128] => [b, 10] 18 out = h2@w3 + b3 #最后一层不用加非线性转换
(7)计算误差
1 # 计算误差 2 y_onehot = tf.one_hot(y,depth = 10) 3 #均方误差 = mean(sum((y-out)**2)) 4 loss = tf.square(y_onehot - out) #计算平方 5 loss = tf.reduce_mean(loss) #计算均值,得到一个scalar
(8)计算梯度,更新w和b
在计算的过程中,外层循环是对一次数据集的所有图片做一个循环,而将与梯度计算相关的部分(即与w和b相关的代码)放进GradientTape中
1 # 梯度计算
2 grads = tape.gradient(loss,[w1,b1,w2,b2,w3,b3]) #返回一个list
3 # print(grads)
4 w1 = tf.Variable(w1 - lr * grads[0] ) #w与grads列表中的元素是对应的
5 b1 = tf.Variable(b1 - lr * grads[1] ) #当进行一次运算之后返回的w和b会重新变成tf.tensor类型
6 w2 = tf.Variable(w2 - lr * grads[2])
7 b2 = tf.Variable(b2 - lr * grads[3])
8 w3 = tf.Variable(w3 - lr * grads[4])
9 b3 = tf.Variable(b3 - lr * grads[5])
10 # # w1 = w1 - lr * w1_grad 第二种方式
11 # w1.assign_sub(lr * grads[0])
12 # b1.assign_sub(lr * grads[1])
13 # w2.assign_sub(lr * grads[2])
14 # b2.assign_sub(lr * grads[3])
15 # w3.assign_sub(lr * grads[4])
16 # b3.assign_sub(lr * grads[5])
(9)enumerate()的使用
- 如果对一个列表,既要遍历索引又要遍历元素时,首先可以这样写:
1 list1 = ["零", "一", "二", "三"] 2 for i in range (len(list1)): 3 print i ,list1[i]
- 上述方法有些累赘,利用enumerate()会更加直接和优美:
1 list1 = ["零", "一", "二", "三"] 2 for index, item in enumerate(list1): 3 print index, item
输出:
0 零 1 一 2 二 3 三
- enumerate还可以接收第二个参数,用于指定索引起始值,如:
1 list1 = ["零", "一", "二", "三"] 2 for index, item in enumerate(list1,1): 3 print(index, item)
输出:
1 零 2 一 3 二 4 三
(10)完整代码
1 import tensorflow as tf 2 from tensorflow import keras #keras是深度集成的一个包 3 from tensorflow.keras import datasets #keras提供的可以管理数据集的一个工具,不需要人为的下载数据集 4 import os 5 os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #去掉警告,将警告级别提升 6 7 # x.shape:[20k,28,28] 20k:200000 8 # y.shape:[60k] 60k:600000 9 (x,y),test = datasets.mnist.load_data() #返回一个训练集和测试集 10 11 x = tf.convert_to_tensor(x, dtype=tf.float32)/255 12 y = tf.convert_to_tensor(y, dtype=tf.int32) 13 # print(x.shape, y.shape) #(60000,) 14 # 15 # print(tf.reduce_min(x),tf.reduce_max(x)) 16 # print(tf.reduce_min(y),tf.reduce_max(y)) 17 18 train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128) #切片128个批量,归一化 19 train_iter = iter(train_db) #迭代器 20 sample = next(train_iter) #每次取的是一个batch128 21 print("batch: ",sample[0].shape,sample[1].shape) # sample[0]表示x,sample[1]表示y,batch: x: (128, 28, 28) y: (128,) 22 23 # [[x@w1+b1]@w2+b2]w3+b3 #上一个输出是当前的输入,当前的输出是下一个的输入 24 25 #降维过程 , b = 128 26 #[b,784] =>[b,256] =>[b,128] =>[b,10] 中间的数值是随便定,最终输出的数值是根据有多少类决定,这里有10类 27 # w的shape:[dim_in, dim_out], b的shape:[dim_out] 28 29 #要将变量设置为tf.Variable类型才会进行更新 30 w1 = tf.Variable(tf.random.truncated_normal([784,256],stddev=0.1)) # 28*28=784 随机初始化w1,a 31 b1 = tf.Variable(tf.zeros([256])) 32 w2 = tf.Variable(tf.random.truncated_normal([256,128],stddev=0.1)) 33 b2 = tf.Variable(tf.zeros([128])) 34 w3 = tf.Variable(tf.random.truncated_normal([128,10],stddev=0.1)) 35 b3 = tf.Variable(tf.zeros([10])) #已经创建好了三个tensor,然后进行前向运算 36 lr = 1e-3 #误差计算 37 print(lr) 38 39 #进行前向运算 @,标准矩阵相乘 40 for epoch in range(10): # iterate db for 10 41 for step,(x,y) in enumerate(train_db): 42 # x.shape:[128,28,28] 43 # y.shape:[128] 44 45 #希望x.shape为[b,28*28],所以先进行维度变换 46 x = tf.reshape(x, [-1,28*28]) # -1会自己进行计算 47 # x.shape:[b,28*28] 48 49 with tf.GradientTape() as tape: #tape胶带,卷尺,封住 50 # h1 = x@w1 + b1 这里自动进行broadcast,我们也可以自己进行broadcast_to 51 # 即 h1 = [b,784] @ [784,256] + [256] => [b,256] +[256] => [b,256] + [b,256] = [b,256] 52 h1 = x@w1 + tf.broadcast_to(b1,[x.shape[0],256]) 53 h1 = tf.nn.relu(h1) #非线性转换 54 55 #[b, 256] => [b, 128] 56 h2 = h1@w2 + b2 57 h2 = tf.nn.relu(h2) # 非线性转换 58 59 #[b, 128] => [b, 10] 60 out = h2@w3 + b3 #最后一层不用加非线性转换 61 62 # 计算误差 63 y_onehot = tf.one_hot(y,depth = 10) 64 #均方误差 = mean(sum((y-out)**2)) 65 loss = tf.square(y_onehot - out) #计算平方 66 loss = tf.reduce_mean(loss) #计算均值,得到一个scalar 67 # print(loss.numpy()) 68 69 # 梯度计算 70 grads = tape.gradient(loss,[w1,b1,w2,b2,w3,b3]) #返回一个list 71 # print(grads) 72 w1 = tf.Variable(w1 - lr * grads[0] ) #w与grads列表中的元素是对应的 73 b1 = tf.Variable(b1 - lr * grads[1] ) #当进行一次运算之后返回的w和b会重新变成tf.tensor类型 74 w2 = tf.Variable(w2 - lr * grads[2]) 75 b2 = tf.Variable(b2 - lr * grads[3]) 76 w3 = tf.Variable(w3 - lr * grads[4]) 77 b3 = tf.Variable(b3 - lr * grads[5]) 78 # # w1 = w1 - lr * w1_grad 79 # w1.assign_sub(lr * grads[0]) 80 # b1.assign_sub(lr * grads[1]) 81 # w2.assign_sub(lr * grads[2]) 82 # b2.assign_sub(lr * grads[3]) 83 # w3.assign_sub(lr * grads[4]) 84 # b3.assign_sub(lr * grads[5]) 85 86 if step % 100 == 0: 87 print(step,"loss: ",float(loss))
输出:
batch: (128, 28, 28) (128,) 0.001 0 loss: 0.33713358640670776 100 loss: 0.17869338393211365 200 loss: 0.18110425770282745 300 loss: 0.16071689128875732 400 loss: 0.15914148092269897 0 loss: 0.15825815498828888 100 loss: 0.1371093988418579 200 loss: 0.14709815382957458 300 loss: 0.1325276792049408 400 loss: 0.13544504344463348 0 loss: 0.13280758261680603 100 loss: 0.12042884528636932 200 loss: 0.1284169852733612 300 loss: 0.1170944944024086 400 loss: 0.1209978237748146 0 loss: 0.1169787272810936 100 loss: 0.10971925407648087 200 loss: 0.1162431463599205 300 loss: 0.10678793489933014 400 loss: 0.11113865673542023 0 loss: 0.10621031373739243 100 loss: 0.10210990905761719 200 loss: 0.10767173767089844 300 loss: 0.09934328496456146 400 loss: 0.10391005128622055 0 loss: 0.09836703538894653 100 loss: 0.09642156213521957 200 loss: 0.10119980573654175 300 loss: 0.0937625914812088 400 loss: 0.09827694296836853 0 loss: 0.09233240783214569 100 loss: 0.09194888174533844 200 loss: 0.09611302614212036 300 loss: 0.08942069113254547 400 loss: 0.09367832541465759 0 loss: 0.08752914518117905 100 loss: 0.08836179226636887 200 loss: 0.0919758751988411 300 loss: 0.08589000999927521 400 loss: 0.08980031311511993 0 loss: 0.08361881971359253 100 loss: 0.08537296950817108 200 loss: 0.08853016793727875 300 loss: 0.08296453952789307 400 loss: 0.0865161120891571 0 loss: 0.08035525679588318 100 loss: 0.08282444626092911 200 loss: 0.08557784557342529 300 loss: 0.0804993063211441 400 loss: 0.08369861543178558