证明adaboost和使用指数损失函数的前向可加模型的等价性

Why Adaboost is equivalent to forward stagewise additive modeling using the loss function \(L(y,f(x))=exp(-yf(x))\)?

First we consider forward stagewise additive modeling using the loss function \(L(y,f(x))=exp(-yf(x))\)

Using the exponential loss function, one must solve:

\[\left(\beta_{m}, G_{m}\right)=\arg \min _{\beta, G} \sum_{i=1}^{N} \exp \left[-y_{i}\left(f_{m-1}\left(x_{i}\right)+\beta G\left(x_{i}\right)\right)\right] \]

we denote \(w_{i}^{(m)}=\exp \left(-y_{i} f_{m-1}\left(x_{i}\right)\right)\),then:

\[\left(\beta_{m}, G_{m}\right)=\arg \min _{\beta, G} \sum_{i=1}^{N} w_{i}^{(m)} \exp \left(-\beta y_{i} G\left(x_{i}\right)\right) \]

Since when \(y_{i} =G(x_{i})\),\(\exp \left(-\beta y_{i} G\left(x_{i}\right)\right)=1\);when \(y_{i} \neq G(x_{i})\),\(\exp \left(-\beta y_{i} G\left(x_{i}\right)\right)=-1\)

we can rewrite \(\sum_{i=1}^{N} w_{i}^{(m)} \exp \left(-\beta y_{i} G\left(x_{i}\right)\right)\) as:

\[e^{-\beta} \cdot \sum_{y_{i}=G\left(x_{i}\right)} w_{i}^{(m)}+e^{\beta} \cdot \sum_{y_{i} \neq G\left(x_{i}\right)} w_{i}^{(m)} \]

which is equivalent as:

\[\left(e^{\beta}-e^{-\beta}\right) \cdot \sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G\left(x_{i}\right)\right)+e^{-\beta} \cdot \sum_{i=1}^{N} w_{i}^{(m)} \]

Therefore, the optimization of G and w are independent:

\[G_{m}=\arg \min _{G} \sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G\left(x_{i}\right)\right) \]

plugging this \(G_m\) into object function taking derivative to \(\beta\), we have:

\[\left(e^{\beta}+e^{-\beta}\right) \cdot \sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G_m\left(x_{i}\right)\right)=e^{-\beta} \cdot \sum_{i=1}^{N} w_{i}^{(m)} \]

\[e^{2\beta}+1=\frac{\sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G_m\left(x_{i}\right)\right)}{\sum_{i=1}^{N} w_{i}^{(m)}} \]

Thus:

\[\beta_{m}=\frac{1}{2} \log \frac{1-\operatorname{err}_{m}}{\operatorname{err}_{m}} \]

where

\[\operatorname{err}_{m}=\frac{\sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)}{\sum_{i=1}^{N} w_{i}^{(m)}} \]

Since we use forward stagewise additive modeling,

\[f_{m}(x)=f_{m-1}(x)+\beta_{m} G_{m}(x) \]

we have:

\[\begin{aligned} w_{i}^{(m+1)}&=\exp \left(-y_{i} f_{m}\left(x_{i}\right)\right)\\&=\exp(-y_i(f_{m-1}(x)+\beta_{m} G_{m}(x)))\\&=w_{i}^{(m)} \cdot e^{-\beta_{m} y_{i} G_{m}\left(x_{i}\right)} \end{aligned} \]

Since \(-y_{i} G_{m}\left(x_{i}\right)=2 \cdot I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)-1\), then

we can rewrite the above equation as

\[w_{i}^{(m+1)}=w_{i}^{(m)} \cdot e^{\alpha_{m} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)} \cdot e^{-\beta_{m}} \]

Where \(\alpha_{m}=2 \beta_{m}\)

we can ignore the factors \(e^{-\beta_m}\) since it is multiplied all weights by the same value.

\[w_{i}^{(m+1)}=w_{i}^{(m)} \cdot e^{\alpha_{m} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)} \]

also, we have:

\[\alpha_{m}=\log \frac{1-\operatorname{err}_{m}}{\operatorname{err}_{m}} \]

Compare it with adaboost, we can find that this is the same algorithm as Adaboost.

posted @ 2020-05-10 17:26  跑得飞快的凤凰花  阅读(301)  评论(0编辑  收藏  举报