几种基础的激活函数及其实现

说明：

首次发表日期：2024-10-31
参考：
- https://insidelearningmachines.com/neural_network_activation_functions
- https://stackoverflow.com/questions/44230635/avoid-overflow-with-softplus-function-in-python

神经元（Neuron)

以下为一个神经元：

\[z = w_1x_1 + w_2x_2 + w_3x_3 + b \]

可以使用向量来表达：

\[z = \vec{w}^T\vec{x} + b \]

\(\vec{x}=\left[\begin{array}{l}x_1 \\ x_2 \\ x_3\end{array}\right]\): 输入（input）
\(\vec{w}=\left[\begin{array}{l}w_1 \\ w_2 \\ w_3\end{array}\right]\): 权重（weights）
\(b\)：偏置（bias）

假设激活函数为\(f\)，那么输出\(y\)为：

\[y = f(x) \]

激活函数

Binary Threshold

\[y= \begin{cases} 1 & z \geq k\\ 0 & z<k \\ \end{cases} \]

def binary(z : np.array, k: float) -> np.array:
    """
    Function to execute the binary threshold activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.round(z >= k)

z = np.linspace(-4,4,num=100)

y = binary(z)

Sigmoid

\[y=1 /\left(1+e^{-z}\right) \]

如果\(z\)是一个很大的正数，那么 \(e^{-z}\) 趋近于 0，然后 \(y\) 趋近于 1
如果\(z\)是一个很大的负数，那么 \(e^{-z}\) 趋近于无穷大，然后 \(y\) 趋近于 0
如果\(z=0\)，那么 \(e^{-z}=1\)，然后 \(y = \frac{1}{2}\)

def sigmoid(z : np.array) -> np.array:
    """
    Function to execute the sigmoid activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return 1/(1+np.exp(-z))
    
z = np.linspace(-5,5,num=100)
y = sigmoid(z)

sigmoid 激活函数常用于 binary classification problems

Softmax

Softmax激活函数适用于 Multiclass classification problems

如果有 \(k\) 个输出分类：

\[y_i=\frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}} \]

softmax是argmax函数的 smooth approximation

def softmax(z : np.array) -> np.array:
    """
    Function to execute the softmax activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.exp(z)/np.sum(np.exp(z))

ReLU

\[y= \begin{cases}z & z \geq 0 \\ 0 & z<0\end{cases} \]

def relu(z : np.array) -> np.array:
    """
    Function to execute the ReLU activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.where(z>=0,z,0)

当 \(z\) 为非正数时，输出\(y\)和梯度均为0，梯度为0会导致训练停止。

PReLU

\[y=\max (0, z)+ a * \min (0, z) \]

其中 \(a\) 是一个通过训练来学习的参数 (learnable parameter)。

相比于ReLU，即使 \(z\) 为负数，梯度也不会为0。

当 \(a = -1\)，\(y=|z|\)，激活函数被称为 absolute value ReLU
当 \(a\) 为一个较小的正数，通常在 0.01 左右，激活函数被称为 leaky ReLU

Tanh

\[y=\frac{e^z-e^{-z}}{e^z+e^{-z}} \]

def tanh(z : np.array) -> np.array:
    """
    Function to execute the tanh activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return (np.exp(z) - np.exp(-z))/(np.exp(z)+np.exp(-z))

当输入 \(z\) 是一个大的正数时， \(e^{-z}\) 趋近于0，\(y \approx \frac{e^z}{e^z}\)，因此 y 趋近于 \(1\)
当输入 \(z\) 是一个大的负数时， \(e^z\) 趋近于0，\(y \approx \frac{-e^{-z}}{e^{-z}}\)，因此 y 趋近于 \(-1\)
当输入 \(z\) 为 0 时， \(e^z = e^{-z}=1\)，因此 \(y=0\)

SoftPlus

\[f(z)=\log _e\left(1+e^z\right) \]

SoftPlus 可以看做是 ReLU 的 smooth approximation

\[\begin{aligned} & \frac{d y}{d z}=f^{\prime}(z)=\frac{d\left(\log_e \left(1+e^z\right)\right)}{d z} \\ & \Longrightarrow f^{\prime}(z)=\frac{e^z}{1+e^z} \\ & \Longrightarrow f^{\prime}(z)=\frac{\frac{e^z}{e^z}}{\frac{1}{e^z}+\frac{e^z}{e^z}} \\ & \Longrightarrow f^{\prime}(z)=\frac{1}{1+e^{-z}} \\ & \Longrightarrow f^{\prime}(z)=\operatorname{sigmoid}(z) \end{aligned} \]

其中应用了：

\[\frac{d}{d x}(\ln x)=\frac{1}{x} \]

和

\[\frac{d y}{d x}=\frac{d y}{d u} \frac{d u}{d x} \]

另外：

\[\begin{aligned} f(z)&=\log _e\left(1+e^z\right) \\ &= \log(1 + e^z) - \log(e^z) + z \\ &= \log\left(\frac{1 + e^z}{e^z}\right)+ z \\ &= \log(1 + e^{-z}) + z \end{aligned} \]

def softplus(z : np.array) -> np.array:
    """
    Function to execute the softplus activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.log(1 + np.exp(-np.abs(z))) + np.maximum(z, 0)

Swish

\[f(z)=z * \operatorname{sigmoid}(z)=\frac{z}{\left(1+e^{-z}\right)} \]

def swish(z : np.array) -> np.array:
    """
    Function to execute the swish activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return z/(1+np.exp(-z))

posted @ 2024-10-31 22:08 shizidushu 阅读(12) 评论(0) 编辑收藏举报

刷新页面返回顶部

几种基础的激活函数及其实现

几种基础的激活函数及其实现

说明：

神经元（Neuron)

激活函数

Binary Threshold

Sigmoid

Softmax

ReLU

PReLU

Tanh

SoftPlus

Swish

公告