Machine Learning--week4 神经网络的基本概念
之前的学习成果并不能解决复杂的非线性问题
Neural Networks
Sigmoid(logistic) activation function: activation function is another term for \(g(z) = \frac{1}{1+e^{-z}}\)
activation: the value that's computed by and as output by a specific
weights = parameters = \(\theta\)
input units: \(x_1,x_2, x_3,\dots, x_n\)
bias unit/ bias neuron: \(x_0\) 与 \(a_0^{(j)}\)
input units 和 hypothesis 之间的layer 由activation 构成
input wire/ output wire:input wire是指指向目标neuron的箭头,output wire是指从目标neuron指出的箭头
\(a_i^{(j)}\): "activation" of neuron \(i\) or of unit \(i\) in layer \(j\)
\(\Theta^{(j)}\): matrix of weights controlling the function mapping form layer \(j\) to layer \(j+1\)
(注意\(\Theta\)是大写的,因为它需要用到矩阵的形式了)
layer 1 == input layer
layer n == output layer (the last layer)
layer 2 ~ layer n-1 == hidden layer
for example:
直观点就是:
)generally, \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\), if network has \(s_j\) units in layer \(j\) and \(s_{j+1}\) units in layer \(j+1\). (\(s_j+1\)中的\(+1\) comes from the addition in \(\Theta^{(j)}\) of the "bias nodes," \(x_0\) and \(\Theta_0^{(j)}\) . In other words the output nodes will not include the bias nodes while the inputs will. )
定义 \(a^{(1)} = x\)
\(z^{j+1} = \Theta^{(j)}a^{(j)}\)
\(x_k^{(j+1)} = \Theta_{k,0}^{(j)}a_0^{(j)} + \Theta_{k,1}^{(j)}a_1^{(j)} + \dots + \Theta_{k,n^{(j)}}^{(j)}a_{n^{(j)}}^{(j)}\quad ,(n^{(j)} \text{ means layer j has } n^{(j)} \text{ activation})\)
\(a^{(j)} = g(z^{(j)}) = g(\Theta^{(j-1)}a^{(j-1)})\quad(j\ge2)\)
设有 \(n\) 个 layers, then the last matrix \(\Theta^{(n)}\) will have only one row which is multiplied by one column \(a^{(j)}\) so that our result is a single number:
\(h_\Theta(x) = a^{(n+1)}=g(z^{(n+1)})\)
Add \(a_0^{(j)}=1\)
Forward Propagation:向前传播
Neural Networks 实际上是使用\(a^{(n-1)}\)layer作为训练logistic regression的特征的,而非input layer,在\(\Theta^{(1)}\)中选择不同的参数可能得到一些复杂的特征,从而的到更好的hypothesis,这样做比直接用\(x_1,x_2,\dots ,x_n\)作为训练特征更好
architecture(架构):the way that neural networks are connected
逻辑表达式对应的\(\theta\):
- \({\rm AND} = (x_1 \bigwedge x_2)\):
- \(\Theta = \begin{bmatrix}-30 &20& 20 \end{bmatrix}\)
- \({\rm NOR} = (\lnot x_1 \bigwedge \lnot x_2)\):
- \(\Theta = \begin{bmatrix}10 & -20& -20 \end{bmatrix}\)
- \({\rm OR} = (x_1 \bigvee x_2)\):
- \(\Theta = \begin{bmatrix}-10 &20& 20 \end{bmatrix}\)
- \({\rm NOT} = (\lnot x)\):
- \(\Theta = \begin{bmatrix}-10 & 20\end{bmatrix}\)
- \({\rm XNOR} = (\lnot x_1 \bigwedge \lnot x_2) \bigvee ( x_1 \bigwedge x_2)\)
- 需要一个hidden layer: \(a_1^{(2)} == (\lnot x_1 \bigwedge \lnot x_2),\quad a_2^{(2)} == (x_1 \bigwedge x_2)\)
- output layer: \(a^{(3)} == (a_1^{(2)} \bigvee a_2^{(2)})\)
逻辑表达式的实现:
令\(x=\begin{bmatrix}1 \\ x_1\\x_2 \end{bmatrix}\), 则 \(a_i = g(\Theta_ix)\)就得到\(\Theta_i\)对应的逻辑运算符运算\(x_1,x_2\)的结果了
比如 \(\Theta_i = \begin{bmatrix}-10 &20& 20 \end{bmatrix}\)那么\(a_i == x_1 \bigvee x_2\)
像\({\rm XNOR}\)这种复杂的逻辑表达式需要借助hidden layer才能算出来
对于 multiclass Classification:
用\(y = \begin{bmatrix}1\\0\\0\\0 \end{bmatrix}, \begin{bmatrix}0\\1\\0\\0 \end{bmatrix}, \begin{bmatrix}0\\0\\1\\0 \end{bmatrix}, \begin{bmatrix}0\\0\\0\\1 \end{bmatrix},\begin{bmatrix}0\\0\\0\\0 \end{bmatrix}\)来表示不同的class,