caffe中的sgd,与激活函数(activation function)
caffe中activation function的形式,直接决定了其训练速度以及SGD的求解。
在caffe中,不同的activation function对应的sgd的方式是不同的,因此,在配置文件中指定activation layer的type,目前caffe中用的最多的是relu的activation function.
caffe中,目前实现的activation function有以下几种:
absval, bnll, power, relu, sigmoid, tanh等几种,分别有单独的layer层。其数学公式分别为:
算了,这部分我不解释了,直接看caffe的tutorial吧
ReLU / Rectified-Linear and Leaky-ReLU
- LayerType:
RELU
- CPU implementation:
./src/caffe/layers/relu_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/relu_layer.cu
- Parameters (
ReLUParameter relu_param
)- Optional
negative_slope
[default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
- Optional
-
Sample (as seen in
./examples/imagenet/imagenet_train_val.prototxt
)layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" }
Given an input value x, The RELU
layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.
Sigmoid
- LayerType:
SIGMOID
- CPU implementation:
./src/caffe/layers/sigmoid_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/sigmoid_layer.cu
-
Sample (as seen in
./examples/imagenet/mnist_autoencoder.prototxt
)layers { name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID }
The SIGMOID
layer computes the output as sigmoid(x) for each input element x.
TanH / Hyperbolic Tangent
- LayerType:
TANH
- CPU implementation:
./src/caffe/layers/tanh_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/tanh_layer.cu
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: TANH }
The TANH
layer computes the output as tanh(x) for each input element x.
Absolute Value
- LayerType:
ABSVAL
- CPU implementation:
./src/caffe/layers/absval_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/absval_layer.cu
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: ABSVAL }
The ABSVAL
layer computes the output as abs(x) for each input element x.
Power
- LayerType:
POWER
- CPU implementation:
./src/caffe/layers/power_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/power_layer.cu
- Parameters (
PowerParameter power_param
)- Optional
power
[default 1]scale
[default 1]shift
[default 0]
- Optional
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: POWER power_param { power: 1 scale: 1 shift: 0 } }
The POWER
layer computes the output as (shift + scale * x) ^ power for each input element x.
BNLL
- LayerType:
BNLL
- CPU implementation:
./src/caffe/layers/bnll_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/bnll_layer.cu
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: BNLL }
The BNLL
(binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人