Caffe源码-LossLayer类(上)
LossLayer类简介
LossLayer类是caffe中各种loss layer的基类,本身并不涉及网络的loss的具体计算,只是规定loss layer的一些通用属性,如输出blob的loss权重默认为1,预测数据与标签数据的维度匹配等。
loss_layer.cpp源码
template <typename Dtype>
void LossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// LossLayers have a non-zero (1) loss by default.
if (this->layer_param_.loss_weight_size() == 0) { //loss layer默认权重为1
this->layer_param_.add_loss_weight(Dtype(1)); //layer param中未设置则置为1
}
}
template <typename Dtype>
void LossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
CHECK_EQ(bottom[0]->shape(0), bottom[1]->shape(0))
<< "The data and label should have the same first dimension."; //第0维的值必须相等
vector<int> loss_shape(0); // Loss layers output a scalar; 0 axes.
top[0]->Reshape(loss_shape); //调整输出blob的大小,一维,大小为0
}
MultinomialLogisticLossLayer类简介
MultinomialLogisticLossLayer类用于计算单标签的多分类任务的logistic loss,每个数据只允许有一个标签值,但是可以划分成多种类别。
- 第一个输入blob为网络的预测概率,大小\(N \times C \times H \times W\),范围\(\hat{p}_{n,k} \in [0, 1]\),第\(n\)个数据的属于第\(k\)类的预测概率为\(\hat{p}_{n,k}\),且\(\forall n, \sum\limits_{k=1}^K \hat{p}_{n,k} = 1\)
- 其中\(N\)为数据的个数,\(K=C \times H \times W\)为类别总数
- 第二个输入blob为标签值,大小\(N \times 1 \times 1 \times 1\),范围\(l_n \in [0, 1, 2, ..., K - 1]\)之间的整数,数据的真实类别为\(l_n\)。
- 前向计算时,loss的计算公式为: \(E=-\frac{1}{N}\sum\limits_{n=1}^{N} \sum\limits_{k=1}^{K} y_{n,k}*\log \hat{p}_{n,k}= -\frac{1}{N}\sum\limits_{n=1}^{N} \log(\hat{p}_{n,l_n})\)
- \(y_{n,k}\)表示第\(n\)个数据的属于第\(k\)类的真实概率,\(y_{n,k}=\left\{\begin{matrix}1 & k=l_n\\0 & k\neq l_n\end{matrix}\right.\)
- 反向计算时,预测blob的梯度的计算公式为:\(\frac{\partial J}{\partial {\hat{p}_{n,l_n}}} = \frac{\partial J}{\partial E}*\frac{\partial E}{\partial {\hat{p}_{n,l_n}}}=-\frac{1}{N}*\frac{\partial J}{\partial E}*\frac{1}{\hat{p}_{n,l_n}}\)
- \(J\)表示整个网络的loss值
multinomial_logistic_loss_layer.cpp源码
template <typename Dtype>
void MultinomialLogisticLossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
LossLayer<Dtype>::Reshape(bottom, top); //调用基类的Reshape(),检查输入blob的第0维大小相等,调整输出blob为一维数据
CHECK_EQ(bottom[1]->channels(), 1); //检查,标签blob的形状必须为[N,1,1,1]
CHECK_EQ(bottom[1]->height(), 1);
CHECK_EQ(bottom[1]->width(), 1);
}
template <typename Dtype>
void MultinomialLogisticLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //前向计算,计算loss值
const Dtype* bottom_data = bottom[0]->cpu_data(); //预测blob的数据指针
const Dtype* bottom_label = bottom[1]->cpu_data(); //标签blob的数据指针
int num = bottom[0]->num(); //数据的个数
int dim = bottom[0]->count() / bottom[0]->num(); //K=C*H*W表示类别总数
Dtype loss = 0;
for (int i = 0; i < num; ++i) {
int label = static_cast<int>(bottom_label[i]); //第i个数据对应的标签值,即数据属于第label类
// bottom_data[i * dim + label]为第i个数据对于第label类的预测概率,kLOG_THRESHOLD为一个较小值,防止|log(prob)|过大
Dtype prob = std::max(bottom_data[i * dim + label], Dtype(kLOG_THRESHOLD));
loss -= log(prob); //计算loss值
}
top[0]->mutable_cpu_data()[0] = loss / num; //输出平均loss
}
template <typename Dtype>
void MultinomialLogisticLossLayer<Dtype>::Backward_cpu(
const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) { //标签blob禁止梯度反传,报错
LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) { //预测blob需要反传梯度
const Dtype* bottom_data = bottom[0]->cpu_data(); //预测blob的数据指针
const Dtype* bottom_label = bottom[1]->cpu_data(); //标签blob的数据指针
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //预测blob的梯度数据指针
int num = bottom[0]->num(); //数据的个数
int dim = bottom[0]->count() / bottom[0]->num(); //K=C*H*W表示类别总数
caffe_set(bottom[0]->count(), Dtype(0), bottom_diff); //先清空预测blob的梯度,置为0
const Dtype scale = - top[0]->cpu_diff()[0] / num; //系数,即为 -\frac{1}{N}*\frac{\partial J}{\partial E}
for (int i = 0; i < num; ++i) {
int label = static_cast<int>(bottom_label[i]); //数据的标签值,数据属于第label类
Dtype prob = std::max(bottom_data[i * dim + label], Dtype(kLOG_THRESHOLD)); //第i个数据在label类别上的预测概率
bottom_diff[i * dim + label] = scale / prob; //得到当前数据对应的梯度
}
}
}
SoftmaxWithLossLayer类简介
SoftmaxWithLossLayer类同样用于计算单标签的多分类问题的损失函数,原理上等同于SoftmaxLayer + MultinomialLogisticLossLayer,但是caffe中推荐使用SoftmaxWithLossLayer层,单层计算的运算损失比两层分开来计算要小,数值更稳定。
- 第一个输入blob为网络的预测值,大小\(\tilde{N} \times C \times \tilde H \times \tilde W\),范围\(x_{n,k} \in [-\infty, +\infty]\)。计算loss时使用softmax函数值作为其概率,\(\hat{p}_{n,k} = \frac{e^{x_{n,k}}}{\sum\limits_{k'=1}^{K} e^{x_{n,k'}}}\)。
- 后续假设计算softmax时是沿着第1维(维度\(C\))进行的,则维度\(C\)的大小即为类别总数\(K\),数据的总个数为外部个数(对应代码中的
outer_num_
)乘上内部个数inner_num_
,即\(N=\tilde N * \tilde H * \tilde W\)。
- 第二个输入blob为标签值,大小\(N \times 1 \times 1 \times 1\),范围\(l_n \in [0, 1, 2, ..., K - 1]\)之间的整数,数据的真实类别为\(l_n\)。
- caffe代码中并没有严格限制标签blob的形状必须是\(N \times 1 \times 1 \times 1\)的形式,只要求预测blob与标签blob的第0维相等(LossLayer中要求),和标签blob的总个数等于\(N\)。
- 前向计算时,与MultinomialLogisticLossLayer类相同,loss的计算公式为: \(E=-\frac{1}{N} \sum\limits_{n=1}^N \log(\hat{p}_{n,l_n})\)
- 反向计算时,预测blob的梯度的计算过程如下:
-
\(\frac{{\partial {{\hat p}_{n,{l_n}}}}}{{\partial {x_{n,k}}}}{\rm{ = }}{\left( {\frac{{{e^{{x_{n,{l_n}}}}}}}{{{e^{{x_{n,1}}}}{\rm{ + }}{e^{{x_{n,{\rm{2}}}}}}{\rm{ + }}...{\rm{ + }}{e^{{x_{n,K}}}}}}} \right)_{{x_{n,k}}}}^\prime\)
\({\rm{ = }}\left\{ {\begin{array}{*{20}{c}}{\frac{{ - {e^{{x_{n,{l_n}}}}}*{e^{{x_{n,k}}}}}}{{{{\left( {{e^{{x_{n,1}}}}{\rm{ + }}{e^{{x_{n,{\rm{2}}}}}}{\rm{ + }}...{\rm{ + }}{e^{{x_{n,K}}}}} \right)}^2}}} + \frac{{{e^{{x_{n,{l_n}}}}}}}{{{e^{{x_{n,1}}}}{\rm{ + }}{e^{{x_{n,{\rm{2}}}}}}{\rm{ + }}...{\rm{ + }}{e^{{x_{n,K}}}}}} = {{\hat p}_{n,{l_n}}} - {{\hat p}_{n,{l_n}}}*{{\hat p}_{n,{l_n}}}{\rm{ }},k = {l_n}}\\{\frac{{ - {e^{{x_{n,{l_n}}}}}*{e^{{x_{n,k}}}}}}{{{{\left( {{e^{{x_{n,1}}}}{\rm{ + }}{e^{{x_{n,{\rm{2}}}}}}{\rm{ + }}...{\rm{ + }}{e^{{x_{n,K}}}}} \right)}^2}}} = - {{\hat p}_{n,{l_n}}}*{{\hat p}_{n,k}}{\rm{}},k \ne {l_n}}\end{array}} \right.\) -
\(E = - \frac{1}{N}\sum\limits_{n = 1}^N {\log } ({{\hat p}_{n,{l_n}}}){\rm{ = }} - \frac{1}{N}\left( {\log {{\hat p}_{1,{l_1}}} + \log {{\hat p}_{2,{l_2}}} + ... + \log {{\hat p}_{n,{l_n}}}} \right)\)
-
注意\(\frac{{\partial E}}{{\partial {{\hat p}_{n,k'}}}}\)仅在\(k'=l_n\)时才为非0值。
-
\(\frac{{\partial E}}{{\partial {x_{n,k}}}} = \sum\limits_{k' = 1}^K {\frac{{\partial E}}{{\partial {{\hat p}_{n,k'}}}}} \frac{{\partial {{\hat p}_{n,k'}}}}{{\partial {x_{n,k}}}} = - \frac{1}{N}*\frac{1}{{{{\hat p}_{n,{l_n}}}}}*\frac{{\partial {{\hat p}_{n,{l_n}}}}}{{\partial {x_{n,k}}}}\)
\(= \left\{ {\begin{array}{*{20}{c}}{ - \frac{1}{N}*\frac{1}{{{{\hat p}_{n,{l_n}}}}}*\left( {{{\hat p}_{n,{l_n}}} - {{\hat p}_{n,{l_n}}}*{{\hat p}_{n,{l_n}}}} \right) = \frac{1}{N}\left( {{{\hat p}_{n,{l_n}}} - 1} \right){\rm{ }},k = {l_n}}\\{ - \frac{1}{N}*\frac{1}{{{{\hat p}_{n,{l_n}}}}}*\left( { - {{\hat p}_{n,{l_n}}}*{{\hat p}_{n,k}}} \right) = \frac{1}{N}{{\hat p}_{n,k}}{\rm{}},k \ne {l_n}}\end{array}} \right.\) -
最后可计算:\(\frac{\partial J}{\partial {x_{n,k}}} = \frac{\partial J}{\partial E}*\frac{\partial E}{\partial {x_{n,k}}}\)
softmax_loss_layer.cpp源码
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //layer初始化
LossLayer<Dtype>::LayerSetUp(bottom, top); //调用父类的函数
LayerParameter softmax_param(this->layer_param_); //当前层的参数
softmax_param.set_type("Softmax"); //层的类型为"Softmax"
softmax_layer_ = LayerRegistry<Dtype>::CreateLayer(softmax_param); //根据层的类型创建一个Softmax层
softmax_bottom_vec_.clear();
softmax_bottom_vec_.push_back(bottom[0]); //Softmax层的输入blob与当前层的输入blob形状相同
softmax_top_vec_.clear();
softmax_top_vec_.push_back(&prob_); //将prob_用于存储Softmax层的输出数据
softmax_layer_->SetUp(softmax_bottom_vec_, softmax_top_vec_); //调用Softmax层的SetUp函数
has_ignore_label_ = this->layer_param_.loss_param().has_ignore_label(); //设置了无效标签
if (has_ignore_label_) {
ignore_label_ = this->layer_param_.loss_param().ignore_label(); //将参数中的无效标签保存在ignore_label_中
}
if (!this->layer_param_.loss_param().has_normalization() &&
this->layer_param_.loss_param().has_normalize()) { //未设置normalization(新版本)参数但是设置了normalize(旧版本)参数
//normalize为true时,使用VALID规范化形式,为false时,使用BATCH_SIZE规范化形式
normalization_ = this->layer_param_.loss_param().normalize() ?
LossParameter_NormalizationMode_VALID :
LossParameter_NormalizationMode_BATCH_SIZE;
} else {
normalization_ = this->layer_param_.loss_param().normalization(); //使用normalization中的设置
}
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //调整输入输出blob的
LossLayer<Dtype>::Reshape(bottom, top); //使用基类的函数,设置输出blob的形状为一维
softmax_layer_->Reshape(softmax_bottom_vec_, softmax_top_vec_); //调整Softmax层的输入输出blob的形状
//softmax_param().axis()可正可负,表示沿着第axis()维计算softmax值,其他维度之间的数据在计算时相互独立
//同时,第softmax_axis_维的大小也表示数据的类别总数,例如图像的C
softmax_axis_ = bottom[0]->CanonicalAxisIndex(this->layer_param_.softmax_param().axis());
outer_num_ = bottom[0]->count(0, softmax_axis_); //第[0, softmax_axis_)维之间的大小,作为数据的外部个数,例如图像的N
inner_num_ = bottom[0]->count(softmax_axis_ + 1); //第[softmax_axis_+1, end)维之间的大小,作为数据的内部总数,例如图像的H*W
CHECK_EQ(outer_num_ * inner_num_, bottom[1]->count()) //外部个数乘上内部总数,需等于输出blob的总大小
<< "Number of labels must match number of predictions; "
<< "e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), "
<< "label count (number of labels) must be N*H*W, "
<< "with integer values in {0, 1, ..., C-1}.";
if (top.size() >= 2) { //多个输出blob
// softmax output
top[1]->ReshapeLike(*bottom[0]); //将top[1]作为内部创建的Softmax层的输出
}
}
template <typename Dtype>
Dtype SoftmaxWithLossLayer<Dtype>::get_normalizer(
LossParameter_NormalizationMode normalization_mode, int valid_count) { //根据规范化类型和有效数据个数,计算规范化的系数
Dtype normalizer; //规范化系数
switch (normalization_mode) { //规范化类型
case LossParameter_NormalizationMode_FULL:
normalizer = Dtype(outer_num_ * inner_num_); //FULL模式,规范化系数为外部个数乘上内部个数
break;
case LossParameter_NormalizationMode_VALID: //VALID模式,规范化系数为有效数据的个数
if (valid_count == -1) { //valid_count为-1,表示所有数据均为有效数据,则与FULL模式等同
normalizer = Dtype(outer_num_ * inner_num_);
} else {
normalizer = Dtype(valid_count);
}
break;
case LossParameter_NormalizationMode_BATCH_SIZE: //BATCH_SIZE模式,规范化系数为数据的外部个数
normalizer = Dtype(outer_num_);
break;
case LossParameter_NormalizationMode_NONE: //NONE模式,规范化系数为1
normalizer = Dtype(1);
break;
default:
LOG(FATAL) << "Unknown normalization mode: "
<< LossParameter_NormalizationMode_Name(normalization_mode);
}
// Some users will have no labels for some examples in order to 'turn off' a
// particular loss in a multi-task setup. The max prevents NaNs in that case.
return std::max(Dtype(1.0), normalizer); //防止有效数据的个数为0,设置最小值为1
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //前向计算
// The forward pass computes the softmax prob values.
softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_); //先执行Softmax层的前向计算过程
//输入blob的形状为(N, C, H, W),假设沿第1维计算softmax,则Softmax层的输出prob_为(N, C, H, W)形状,
//标签的个数为N * H * W, outer_num_ = N, inner_num_ = H * W
const Dtype* prob_data = prob_.cpu_data(); //Softmax层的输出数据
const Dtype* label = bottom[1]->cpu_data(); //标签数据
int dim = prob_.count() / outer_num_; // C * H * W
int count = 0; //有效数据个数
Dtype loss = 0;
for (int i = 0; i < outer_num_; ++i) {
for (int j = 0; j < inner_num_; j++) {
const int label_value = static_cast<int>(label[i * inner_num_ + j]); //第(i, j)位置的数据的标签值
if (has_ignore_label_ && label_value == ignore_label_) {
continue; //设置了无效标签并且当前标签无效,则忽略
}
DCHECK_GE(label_value, 0); //检查,标签值不小于0
DCHECK_LT(label_value, prob_.shape(softmax_axis_)); //检查,标签值小于类别总数
//获取第(i, j)位置的数据在label_value类别上的预测值,并计算loss
loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j], Dtype(FLT_MIN)));
++count; //有效个数自增
}
}
top[0]->mutable_cpu_data()[0] = loss / get_normalizer(normalization_, count); //计算规范化系数,得到最终loss
if (top.size() == 2) {
top[1]->ShareData(prob_); //将Softmax层的输出作为SoftmaxWithLossLayer层的top[1]输出
}
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) { //同样,标签blob不允许设置反传
LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) { //允许反传
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //预测blob的梯度数据指针
const Dtype* prob_data = prob_.cpu_data(); //Softmax层的输出数据指针
caffe_copy(prob_.count(), prob_data, bottom_diff); //bottom_diff = prob_data
const Dtype* label = bottom[1]->cpu_data(); //标签blob的数据指针
int dim = prob_.count() / outer_num_; // C * H * W
int count = 0; //有效数据个数
for (int i = 0; i < outer_num_; ++i) {
for (int j = 0; j < inner_num_; ++j) {
const int label_value = static_cast<int>(label[i * inner_num_ + j]); //第(i, j)位置的数据的标签值
if (has_ignore_label_ && label_value == ignore_label_) { //当前标签无效
for (int c = 0; c < bottom[0]->shape(softmax_axis_); ++c) { //维度C上的每个值
bottom_diff[i * dim + c * inner_num_ + j] = 0; //将预测blob的第(i, j)位置的维度C上的每个值的梯度都清零
}
} else {
bottom_diff[i * dim + label_value * inner_num_ + j] -= 1; //只在维度C上的第label_value类别上减1,bottom_diff -= 1
++count;
}
}
}
// Scale gradient
Dtype loss_weight = top[0]->cpu_diff()[0] / get_normalizer(normalization_, count); //计算系数
caffe_scal(prob_.count(), loss_weight, bottom_diff); //bottom_diff *= loss_weight
}
}
SigmoidCrossEntropyLossLayer类简介
SigmoidCrossEntropyLossLayer类用于计算多标签的二分类的交叉熵损失,每个数据允许有多个标签,但是每个标签只有0或1两种类别。
- 第一个输入blob为网络的预测值,大小\(N \times C \times H \times W\),范围\(x_n \in [-\infty, +\infty]\)。计算loss时使用Sigmoid函数值作为其概率,\(\hat{p}_n = \sigma(x_n)\)
- 第二个输入blob为标签值,大小\(N \times C \times H \times W\),范围\(p_n \in [0, 1]\)。
- 同样,代码中并没有严格限制预测blob与标签blob的形状必须相同,只要求两个blob的第0维相等,和总个数相等。
- 前向计算时,loss的计算公式为: \(E = -\frac{1}{N} \sum\limits_{n=1}^N \left[p_n \log \hat{p}_n + (1 - p_n) \log(1 - \hat{p}_n)\right]\)
- \(N\)即为下文源码中的
normalizer_
。 - 注意代码中为了防止\(e^{-x}\)在计算时过大,稍微变换了下计算公式:
- \(E=-\frac{1}{N} \sum\limits_{n=1}^N [p_n \log \sigma(x_n) + (1 - p_n) \log(1 - \sigma(x_n))]\\=-\frac{1}{N} \sum\limits_{n=1}^N [x_n (p_n-1)+\log \sigma(x_n)] \\ =\left\{\begin{matrix} -\frac{1}{N} \sum\limits_{n=1}^N [x_n (p_n-1)-\log (1+e^{-x_n}))] & x_n \geqslant 0\\ -\frac{1}{N} \sum\limits_{n=1}^N [x_n p_n-\log (1+e^{x_n}))] & x_n<0 \end{matrix}\right.\)
- 反向计算时,预测blob的梯度的计算公式为:\(\frac{\partial J}{\partial {x_n}} = \frac{\partial J}{\partial E}*\frac{\partial E}{\partial {x_n}}=\frac{1}{N}*\frac{\partial J}{\partial E}*(\sigma(x_n)-p_n)\)
- \(J\)表示整个网络的loss值,\(\frac{\partial J}{\partial E}\)即为代码中的
top[0]->cpu_diff()[0]
sigmoid_cross_entropy_loss_layer.cpp源码
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
LossLayer<Dtype>::LayerSetUp(bottom, top); //调用基类的LayerSetUp(),设置输出blob的loss权重
sigmoid_bottom_vec_.clear();
sigmoid_bottom_vec_.push_back(bottom[0]); //先清空,再将预测值存入,作为SigmoidLayer层的输入
sigmoid_top_vec_.clear();
sigmoid_top_vec_.push_back(sigmoid_output_.get()); //清空,作为SigmoidLayer层的输出
sigmoid_layer_->SetUp(sigmoid_bottom_vec_, sigmoid_top_vec_); //创建SigmoidLayer层,调整输出blob的形状等
has_ignore_label_ = this->layer_param_.loss_param().has_ignore_label(); //如果设置了无效类别
if (has_ignore_label_) {
ignore_label_ = this->layer_param_.loss_param().ignore_label(); //将无效的类别保存在当前层中
}
if (this->layer_param_.loss_param().has_normalization()) { //如果设置了规范化方式
normalization_ = this->layer_param_.loss_param().normalization(); //将其保存在当前层中
} else if (this->layer_param_.loss_param().has_normalize()) { //如果设置了规范化方式,normalize(旧版本)
// normalize为true则为VALID规范化方式,为false则为BATCH_SIZE规范化方式
normalization_ = this->layer_param_.loss_param().normalize() ?
LossParameter_NormalizationMode_VALID :
LossParameter_NormalizationMode_BATCH_SIZE;
} else {
//默认使用BATCH_SIZE方式,只有SigmoidCrossEntropyLoss的默认规范化方式为BATCH_SIZE,其他的默认方式为VALID
normalization_ = LossParameter_NormalizationMode_BATCH_SIZE;
}
}
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
LossLayer<Dtype>::Reshape(bottom, top); //调用基类的Reshape()函数,检查输入输出的形状
outer_num_ = bottom[0]->shape(0); //数据的外部个数N, batch size
inner_num_ = bottom[0]->count(1); //数据的内部个数C*H*W, instance size: |output| == |target|
CHECK_EQ(bottom[0]->count(), bottom[1]->count()) <<
"SIGMOID_CROSS_ENTROPY_LOSS layer inputs must have the same count."; //检查预测与标签blob的总个数是否相等
sigmoid_layer_->Reshape(sigmoid_bottom_vec_, sigmoid_top_vec_); //调用SigmoidLayer的Reshape()函数,调整形状
}
// TODO(shelhamer) loss normalization should be pulled up into LossLayer,
// instead of duplicated here and in SoftMaxWithLossLayer
template <typename Dtype>
Dtype SigmoidCrossEntropyLossLayer<Dtype>::get_normalizer( //根据规范化类型和有效数据个数,计算规范化的系数
LossParameter_NormalizationMode normalization_mode, int valid_count) {
Dtype normalizer;
switch (normalization_mode) { //规范化类型
case LossParameter_NormalizationMode_FULL:
normalizer = Dtype(outer_num_ * inner_num_); //FULL模式,规范化系数为外部个数乘上内部个数
break;
case LossParameter_NormalizationMode_VALID: //VALID模式,规范化系数为有效数据的个数
if (valid_count == -1) { //valid_count为-1,表示所有数据均为有效数据,则与FULL模式等同
normalizer = Dtype(outer_num_ * inner_num_);
} else {
normalizer = Dtype(valid_count);
}
break;
case LossParameter_NormalizationMode_BATCH_SIZE: //BATCH_SIZE模式,规范化系数为数据的外部个数
normalizer = Dtype(outer_num_);
break;
case LossParameter_NormalizationMode_NONE: //NONE模式,规范化系数为1
normalizer = Dtype(1);
break;
default: //其他类型,返回错误
LOG(FATAL) << "Unknown normalization mode: " << LossParameter_NormalizationMode_Name(normalization_mode);
}
// Some users will have no labels for some examples in order to 'turn off' a
// particular loss in a multi-task setup. The max prevents NaNs in that case.
return std::max(Dtype(1.0), normalizer); //设置最小为1.某些数据可能不存在标签,valid_count可能为0,防止后续错误
}
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// The forward pass computes the sigmoid outputs.
sigmoid_bottom_vec_[0] = bottom[0]; //将预测值作为SigmoidLayer层的输入
sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_); //计算SigmoidLayer层的输出sigmoid_top_vec_
// Compute the loss (negative log likelihood)
// Stable version of loss computation from input data
const Dtype* input_data = bottom[0]->cpu_data(); //预测值
const Dtype* target = bottom[1]->cpu_data(); //标签值
int valid_count = 0; //有效数据的个数
Dtype loss = 0;
for (int i = 0; i < bottom[0]->count(); ++i) {
const int target_value = static_cast<int>(target[i]); //第i个数据的标签值
if (has_ignore_label_ && target_value == ignore_label_) { //如果设置了无效标签,并且当前标签即为无效值
continue; //则忽略
}
//x = input_data[i] < 0时, loss -= x*p-log(1+exp(x))
//x > 0时, loss -= x*(p-1)-log(1+exp(-x)),此处为了防止exp(x)的值过大
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
++valid_count; //有效个数自增
}
normalizer_ = get_normalizer(normalization_, valid_count); //计算规范化系数
top[0]->mutable_cpu_data()[0] = loss / normalizer_; //除以规范化系数,得到最终的loss值
}
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Backward_cpu(
const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) { //同样,标签对应的输出blob不允许设置反传
LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) { //预测blob需要反传
// First, compute the diff
const int count = bottom[0]->count(); //数据个数
const Dtype* sigmoid_output_data = sigmoid_output_->cpu_data(); //SigmoidLayer层的输出数据σ(x)
const Dtype* target = bottom[1]->cpu_data(); //标签值
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //预测blob的梯度数据指针
caffe_sub(count, sigmoid_output_data, target, bottom_diff); //bottom_diff = sigmoid_output_data - target
// Zero out gradient of ignored targets.
if (has_ignore_label_) { //如果设置了无效标签
for (int i = 0; i < count; ++i) {
const int target_value = static_cast<int>(target[i]); //第i个数据的标签值
if (target_value == ignore_label_) { //当前数据为无效标签
bottom_diff[i] = 0; //梯度置为0
}
}
}
// Scale down gradient
Dtype loss_weight = top[0]->cpu_diff()[0] / normalizer_; //系数
caffe_scal(count, loss_weight, bottom_diff); //bottom_diff *= loss_weight
}
}
小结
- 梯度累加是针对layer的参数blob,layer的输入输出blob的梯度是不会累加的
参考
http://freemind.pluskid.org/machine-learning/softmax-vs-softmax-loss-numerical-stability/
Caffe的源码笔者是第一次阅读,一边阅读一边记录,对代码的理解和分析可能会存在错误或遗漏,希望各位读者批评指正,谢谢支持!