caffe中的sgd,与激活函数(activation function)

caffe中activation function的形式,直接决定了其训练速度以及SGD的求解。

在caffe中,不同的activation function对应的sgd的方式是不同的,因此,在配置文件中指定activation layer的type,目前caffe中用的最多的是relu的activation function.

caffe中,目前实现的activation function有以下几种:

absval, bnll, power, relu, sigmoid, tanh等几种,分别有单独的layer层。其数学公式分别为:

 

算了,这部分我不解释了,直接看caffe的tutorial

ReLU / Rectified-Linear and Leaky-ReLU

  • LayerType: RELU
  • CPU implementation: ./src/caffe/layers/relu_layer.cpp
  • CUDA GPU implementation: ./src/caffe/layers/relu_layer.cu
  • Parameters (ReLUParameter relu_param)
    • Optional
      • negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
  • Sample (as seen in ./examples/imagenet/imagenet_train_val.prototxt)

    layers {
      name: "relu1"
      type: RELU
      bottom: "conv1"
      top: "conv1"
    }
    

Given an input value x, The RELU layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.

Sigmoid

  • LayerType: SIGMOID
  • CPU implementation: ./src/caffe/layers/sigmoid_layer.cpp
  • CUDA GPU implementation: ./src/caffe/layers/sigmoid_layer.cu
  • Sample (as seen in ./examples/imagenet/mnist_autoencoder.prototxt)

    layers {
      name: "encode1neuron"
      bottom: "encode1"
      top: "encode1neuron"
      type: SIGMOID
    }
    

The SIGMOID layer computes the output as sigmoid(x) for each input element x.

TanH / Hyperbolic Tangent

  • LayerType: TANH
  • CPU implementation: ./src/caffe/layers/tanh_layer.cpp
  • CUDA GPU implementation: ./src/caffe/layers/tanh_layer.cu
  • Sample

    layers {
      name: "layer"
      bottom: "in"
      top: "out"
      type: TANH
    }
    

The TANH layer computes the output as tanh(x) for each input element x.

Absolute Value

  • LayerType: ABSVAL
  • CPU implementation: ./src/caffe/layers/absval_layer.cpp
  • CUDA GPU implementation: ./src/caffe/layers/absval_layer.cu
  • Sample

    layers {
      name: "layer"
      bottom: "in"
      top: "out"
      type: ABSVAL
    }
    

The ABSVAL layer computes the output as abs(x) for each input element x.

Power

  • LayerType: POWER
  • CPU implementation: ./src/caffe/layers/power_layer.cpp
  • CUDA GPU implementation: ./src/caffe/layers/power_layer.cu
  • Parameters (PowerParameter power_param)
    • Optional
      • power [default 1]
      • scale [default 1]
      • shift [default 0]
  • Sample

    layers {
      name: "layer"
      bottom: "in"
      top: "out"
      type: POWER
      power_param {
        power: 1
        scale: 1
        shift: 0
      }
    }
    

The POWER layer computes the output as (shift + scale * x) ^ power for each input element x.

BNLL

  • LayerType: BNLL
  • CPU implementation: ./src/caffe/layers/bnll_layer.cpp
  • CUDA GPU implementation: ./src/caffe/layers/bnll_layer.cu
  • Sample

    layers {
      name: "layer"
      bottom: "in"
      top: "out"
      type: BNLL
    }
    

The BNLL (binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.

 

posted @   deeplearner_allen  阅读(3725)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
点击右上角即可分享
微信分享提示