Activation Functions


Sigmoids saturate and kill gradients.

Sigmoid outputs are not zero-centered.

Exponential function is a little computational expensive.



Kill gradients when saturated.

It's zero-centered! : )



Does not saturate. ( in positive region)

Very computational efficient.

Converges much faster than sigmoid/tanh in practice. (6 times)

Seems more biologically plausible than sigmoid.


Not zero-centered.

No gradient when x<0.


Take care of learning rate when using ReLU.


Leakly ReLU

Does not saturate.

Very computational efficient.

Converges much faster than sigmoid/tanh in practice. (6 times)

will not "die"


Parametric ReLU


Exponential Linear Unit

posted @ 2019-03-29 19:35  leizhao  阅读(144)  评论(0编辑  收藏  举报