tensorflow 笔记8:RNN、Lstm源码,训练代码输入输出,维度分析
tensorflow 官网信息:https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell
tensorflow 版本:1.10
如有错误还望指正,一起探讨;
当前层各个参数含义:
Tensorflow 中RNN单个时刻计算流程:
Tensorflow 中 lstm 单个时刻计算流程:
注:上面计算[H,X] * W后和B维度不同, 如何相加,解释如下;
- tensorflow代码中,用的这个 nn_ops.bias_add(gate_inputs, self._bias),这个函数的计算方法是,让每个 batch 的输出值,都加上这个 B;
- 所以维度不同可以相加:【batch_size,Hidden_size】,【Hidden_size】,见函数演示:nn_ops.bias_add
tensorflow 代码分析:见如下
tensorflow version:1.9
注:以下是一个batch,一个时刻的计算,若计算所有时刻,则循环执行以下代码,num_step(句长)次; tensorflow 已经封装好了,不需要我们写;
1 RNN 关键代码: 2 @tf_export("nn.rnn_cell.BasicRNNCell") 3 class BasicRNNCell(LayerRNNCell): 4 """The most basic RNN cell. 5 Args: 6 num_units: int, The number of units in the RNN cell. 7 activation: Nonlinearity to use. Default: `tanh`. 8 reuse: (optional) Python boolean describing whether to reuse variables 9 in an existing scope. If not `True`, and the existing scope already has 10 the given variables, an error is raised. 11 name: String, the name of the layer. Layers with the same name will 12 share weights, but to avoid mistakes we require reuse=True in such 13 cases. 14 dtype: Default dtype of the layer (default of `None` means use the type 15 of the first input). Required when `build` is called before `call`. 16 """ 17 18 def __init__(self, 19 num_units, 20 activation=None, 21 reuse=None, 22 name=None, 23 dtype=None): 24 super(BasicRNNCell, self).__init__(_reuse=reuse, name=name, dtype=dtype) 25 26 # Inputs must be 2-dimensional. 27 self.input_spec = base_layer.InputSpec(ndim=2) 28 29 self._num_units = num_units 30 self._activation = activation or math_ops.tanh 31 32 @property 33 def state_size(self): 34 return self._num_units 35 36 @property 37 def output_size(self): 38 return self._num_units 39 40 def build(self, inputs_shape): 41 if inputs_shape[1].value is None: 42 raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s" 43 % inputs_shape) 44 45 input_depth = inputs_shape[1].value 46 47 # 初始化生成 W 和 B,shape 大小为 48 # W: [input_size + Hidden_size, Hidden_size) 49 # B: [Hidden_size] 50 self._kernel = self.add_variable( 51 _WEIGHTS_VARIABLE_NAME, 52 shape=[input_depth + self._num_units, self._num_units]) 53 self._bias = self.add_variable( 54 _BIAS_VARIABLE_NAME, 55 shape=[self._num_units], 56 initializer=init_ops.zeros_initializer(dtype=self.dtype)) 57 58 self.built = True 59 # 循环该函数 num_step(句子长度) 次,则该层计算完; 60 def call(self, inputs, state): 61 """Most basic RNN: output = new_state = act(W * input + U * state + B).""" 62 # output = Ht = tanh([x,Ht-1]*W + B) 63 # 如果是第 0 时刻,那么当前的 state(即上一时刻的输出H0)的值全部为0; 64 # input 的 shape为: [batch_size,emb_size] 65 # state 的 shape为:[batch_zize,Hidden_size] 66 # matmul : 矩阵相乘 67 # array_ops.concat: 两个矩阵连接,连接后的 shape 为 [batch_size,input_size + Hidden_size],实际就是[Xt,Ht-1] 68 69 # 此时计算: [input,state] * [W,U] == [Xt,Ht-1] * W,得到的shape为:[batch_size,Hidden_size] 70 gate_inputs = math_ops.matmul( 71 array_ops.concat([inputs, state], 1), self._kernel) 72 # B 的shape 为:【Hidden_size】,[Xt,Ht-1] * W 计算后的shape为:[batch_size,Hidden_size] 73 # nn_ops.bias_add,这个函数的计算方法是,让每个 batch 得到的值,都加上这个 B; 74 # 这一步,加上B后:Ht = tanh([Xt,Ht-1] * W + B),得到的 shape 还是: [batch_size,Hidden_size] 75 # 那么这个 Ht 将作为下一时刻的输入和下一层的输入; 76 gate_inputs = nn_ops.bias_add(gate_inputs, self._bias) 77 output = self._activation(gate_inputs) 78 #此时return的维度为:[batch_size,Hidden_size] 79 # 一个output作为下一时刻的输入Ht,另一个作为下一层的输入 Ht 80 return output, output 81 82 LSTM 关键代码: 83 84 @tf_export("nn.rnn_cell.BasicLSTMCell") 85 class BasicLSTMCell(LayerRNNCell): 86 """Basic LSTM recurrent network cell. 87 The implementation is based on: http://arxiv.org/abs/1409.2329. 88 We add forget_bias (default: 1) to the biases of the forget gate in order to 89 reduce the scale of forgetting in the beginning of the training. 90 It does not allow cell clipping, a projection layer, and does not 91 use peep-hole connections: it is the basic baseline. 92 For advanced models, please use the full @{tf.nn.rnn_cell.LSTMCell} 93 that follows. 94 """ 95 96 def __init__(self, 97 num_units, 98 forget_bias=1.0, 99 state_is_tuple=True, 100 activation=None, 101 reuse=None, 102 name=None, 103 dtype=None): 104 """Initialize the basic LSTM cell. 105 Args: 106 num_units: int, The number of units in the LSTM cell. 107 forget_bias: float, The bias added to forget gates (see above). 108 Must set to `0.0` manually when restoring from CudnnLSTM-trained 109 checkpoints. 110 state_is_tuple: If True, accepted and returned states are 2-tuples of 111 the `c_state` and `m_state`. If False, they are concatenated 112 along the column axis. The latter behavior will soon be deprecated. 113 activation: Activation function of the inner states. Default: `tanh`. 114 reuse: (optional) Python boolean describing whether to reuse variables 115 in an existing scope. If not `True`, and the existing scope already has 116 the given variables, an error is raised. 117 name: String, the name of the layer. Layers with the same name will 118 share weights, but to avoid mistakes we require reuse=True in such 119 cases. 120 dtype: Default dtype of the layer (default of `None` means use the type 121 of the first input). Required when `build` is called before `call`. 122 When restoring from CudnnLSTM-trained checkpoints, must use 123 `CudnnCompatibleLSTMCell` instead. 124 """ 125 super(BasicLSTMCell, self).__init__(_reuse=reuse, name=name, dtype=dtype) 126 if not state_is_tuple: 127 logging.warn("%s: Using a concatenated state is slower and will soon be " 128 "deprecated. Use state_is_tuple=True.", self) 129 130 # Inputs must be 2-dimensional. 131 self.input_spec = base_layer.InputSpec(ndim=2) 132 133 self._num_units = num_units 134 self._forget_bias = forget_bias 135 self._state_is_tuple = state_is_tuple 136 self._activation = activation or math_ops.tanh 137 138 @property 139 def state_size(self): 140 # 隐藏层的 size: 141 return (LSTMStateTuple(self._num_units, self._num_units) 142 if self._state_is_tuple else 2 * self._num_units) 143 144 @property 145 def output_size(self): 146 # 输出层的size:Hidden_size 147 return self._num_units 148 149 def build(self, inputs_shape): 150 if inputs_shape[1].value is None: 151 raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s" 152 % inputs_shape) 153 154 #inputs的维度为:[batch_size,input_size] 155 #如果是第一层每个时刻词语的输入,则这个input_size 就是 embedding_size,就等于词向量的维度; 156 # 所以 此时 input_depth,就是input_size 157 input_depth = inputs_shape[1].value 158 # h_depth 就是 Hidden_size,隐藏层的维度 159 h_depth = self._num_units 160 161 # self._kernel == W;则此时 W的维度 为【input_size + Hidden_size,4* Hidden_size】 162 # 此处定义四个 W 和 B,是为了,一次就把 i,j,f,o 计算出来;相当于图中的 ft,it,ct‘,ot 163 self._kernel = self.add_variable( 164 _WEIGHTS_VARIABLE_NAME, 165 shape=[input_depth + h_depth, 4 * self._num_units]) 166 # 此时的B的维度为【4 * Hidden_size】 167 self._bias = self.add_variable( 168 _BIAS_VARIABLE_NAME, 169 shape=[4 * self._num_units], 170 initializer=init_ops.zeros_initializer(dtype=self.dtype)) 171 172 self.built = True 173 174 def call(self, inputs, state): 175 """Long short-term memory cell (LSTM). 176 Args: 177 inputs: `2-D` tensor with shape `[batch_size, input_size]`. 178 state: An `LSTMStateTuple` of state tensors, each shaped 179 `[batch_size, num_units]`, if `state_is_tuple` has been set to 180 `True`. Otherwise, a `Tensor` shaped 181 `[batch_size, 2 * num_units]`. 182 Returns: 183 A pair containing the new hidden state, and the new state (either a 184 `LSTMStateTuple` or a concatenated state, depending on 185 `state_is_tuple`). 186 """ 187 sigmoid = math_ops.sigmoid 188 one = constant_op.constant(1, dtype=dtypes.int32) 189 # Parameters of gates are concatenated into one multiply for efficiency. 190 # 每一层的第0时刻的 c 和 h,元素全部初始化为0; 191 if self._state_is_tuple: 192 c, h = state 193 else: 194 c, h = array_ops.split(value=state, num_or_size_splits=2, axis=one) 195 196 # 此时刻的 input:Xt 和 上一时刻的输出:Ht-1,进行结合; 197 # inputs shape : [batch_size,input_size],第一层的时候,input_size,就相当于 embedding_size 198 # 结合后的维度为【batch_size,input_size + Hidden_size】,W的维度为【input_size + Hidden_size,4*hidden_size】 199 # 两者进行矩阵相乘后的维度为:【batch_size,4*hidden_size】 200 gate_inputs = math_ops.matmul( 201 array_ops.concat([inputs, h], 1), self._kernel) 202 # B 的shape 为:【4 * Hidden_size】,[Xt,Ht-1] * W 计算后的shape为:[batch_size, 4 * Hidden_size] 203 # nn_ops.bias_add,这个函数的计算方法是,让每个 batch 得到的值,都加上这个 B; 204 # 这一步,加上B后,得到的是,i,j,f,o 的结合, [Xt,Ht-1] * W + B,得到的 shape 还是: [batch_size, 4 * Hidden_size] 205 # 加上偏置B后的维度为:【batch_size,4 * Hidden_size】 206 gate_inputs = nn_ops.bias_add(gate_inputs, self._bias) 207 208 # i = input_gate, j = new_input, f = forget_gate, o = output_gate 209 # 从以上的矩阵相乘后,分割出来四部分,就是 i,j,f,o的值; 210 # 每个的维度为【batch_size,Hidden_size】 211 i, j, f, o = array_ops.split( 212 value=gate_inputs, num_or_size_splits=4, axis=one) 213 214 forget_bias_tensor = constant_op.constant(self._forget_bias, dtype=f.dtype) 215 216 # Note that using `add` and `multiply` instead of `+` and `*` gives a 217 # performance improvement. So using those at the cost of readability. 218 add = math_ops.add 219 # 此处加上遗忘的 bias,选择遗忘元素; 220 # 以下计算是:对应元素相乘:因为四个参数的维度都是【batch_size,hidden_size】,计算后维度不变; 221 # new_c = c*sigmoid(f+bias) + sigmoid(i)*tanh(o) 222 223 # 计算后的维度为【batch_size,hidden_size】 224 multiply = math_ops.multiply 225 new_c = add(multiply(c, sigmoid(add(f, forget_bias_tensor))), 226 multiply(sigmoid(i), self._activation(j))) 227 # 以下计算是:对应元素相乘:因为2个参数的维度都是【batch_size,hidden_size】,计算后维度不变; 228 #new_h = sigmoid(o) * tanh(new_c) 229 230 new_h = multiply(self._activation(new_c), sigmoid(o)) 231 232 # 计算后的维度是(值不相等):new_c == new_h == 【batch_size,hidden_size】 233 234 235 if self._state_is_tuple: 236 new_state = LSTMStateTuple(new_c, new_h) 237 else: 238 new_state = array_ops.concat([new_c, new_h], 1) 239 # new_h:最后一个时刻的H,new_state:最后一个时刻的 H和C;循环执行该函数,执行 num_step次(即 最大的步长),则该层计算完全; 240 # 此时的 new_c 和 new_h,作为下一时刻的输入,new_h 和下一时刻的,Xt+1 进行连接,连接后的维度为,【batch_size,input_size + Hidden_size】 241 # 如果还有下一层的话,那么此刻的 new_h,变身为下一时刻的 Xt 242 return new_h, new_state