Activation Function

Posted on 2022-05-30 Edited on 2022-11-18 In Summary Waline:

关于深度学习中激活函数的简单总结。

激活函数

1 简介

激活函数是向神经网络中引入非线性因素，通过激活函数神经网络就可以拟合各种曲线。激活函数主要分为饱和激活函数（Saturated Neurons）和非饱和函数（One-sided Saturations）。

2 Sigmoid 函数

数学形式：

$\mathrm{sigmoid}(x) = \frac{1}{1 + \mathrm{e}^{-x}}$

特点：

能够把输入的连续实值变换为 $0$ 和 $1$ 之间的输出。特别的，如果是非常大的负数，那么输出就是 $0$ ；如果是非常大的正数，输出就是 $1$ 。
缺点 1：在深度神经网络中梯度反向传递时导致梯度爆炸和梯度消失。
缺点 2：Sigmoid 的 output 不是 0 均值（zero-centered）。

3 tanh 函数

数学形式：

$\mathrm{tanh} (x) = \frac{\mathrm{e}^x - \mathrm{e}^{-x}}{\mathrm{e}^x + \mathrm{e}^{-x}}$

特点：

解决了 Sigmoid 函数的不是 zero-centered 的输出问题。

4 ReLU 函数

数学形式：

$\mathrm{ReLU} (x) = \max (0, x)$

特点：

优点：
- 解决了梯度消亡问题
- 计算速度非常快
- 收敛速度远快于 Sigmoid 和 tanh
缺点：
- ReLU 的输出不是 zero-centered
- Dead ReLU 问题，指的是某些神经元可能永远不会被激活，导致相应的参数永远不能被更新

5 Leaky ReLU函数

数学形式：

$\mathrm{LeakyReLU} (x) = \max (\alpha x, x)$

特点：

为解决 Dead ReLU 问题，提出了将 ReLU 的前半段设为 $\alpha x$ ，通常 $\alpha=0.01$ 。
理论上来讲，Leaky ReLU 有 ReLU 的所有优点，且没有 Dead ReLU 问题，但在实际中，并没有完全证明 Leaky ReLU 总是优于 ReLU。
PReLU 函数与 Leaky ReLU 函数相类似

6 ELU（Exponential Linear Units）函数

数学形式：

$\mathrm{ELU} (x) = \begin{cases} x, & \mathrm{if} \ x > 0\\ \alpha (\mathrm{e}^x - 1), & \mathrm{otherwise} \end{cases}$

特点：

ELU 也是为解决 ReLU 存在的问题而提出，基本有 ReLU 的所有优点，没有 Dead ReLU 问题且输出接近 0
问题是计算开销太大

7 Softmax 函数

数学形式：

$\mathrm{softmax}: \bold{x} \in \mathbb{R}^n \rightarrow \bold{x'} \in \mathbb{R}^n\\ \mathrm{s.t.}\ \forall x'_i \in \bold{x'},\ x'_i = \frac{\mathrm{e}^{x_i}}{\sum_{j=1}^n \mathrm{e}^{x_j}}$

特点：

在零点不可微
负输入的梯度为零，这意味着对于该区域的激活，权重不会在反向传播期间更新，因此会产生永不激活的死亡神经元

8 Swish 函数

数学形式：

$\mathrm{swish} (x) = x \cdot \mathrm{sigmoid} (x)$

特点：

无界性有助于防止慢速训练期间，梯度逐渐接近 0 并导致饱和
导数恒大于 0

9 Softplus 函数

数学形式：

$\mathrm{softplus} (x) = \ln (1 + \mathrm{e}^x)$

特点：

类似于 ReLU 函数，但是相对较平滑