Why Adaboost is equivalent to forward stagewise additive modeling using the loss function \(L(y,f(x))=exp(-yf(x))\)?
First we consider forward stagewise additive modeling using the loss function \(L(y,f(x))=exp(-yf(x))\)
Using the exponential loss function, one must solve:
\[\left(\beta_{m}, G_{m}\right)=\arg \min _{\beta, G} \sum_{i=1}^{N} \exp \left[-y_{i}\left(f_{m-1}\left(x_{i}\right)+\beta G\left(x_{i}\right)\right)\right]
\]
we denote \(w_{i}^{(m)}=\exp \left(-y_{i} f_{m-1}\left(x_{i}\right)\right)\),then:
\[\left(\beta_{m}, G_{m}\right)=\arg \min _{\beta, G} \sum_{i=1}^{N} w_{i}^{(m)} \exp \left(-\beta y_{i} G\left(x_{i}\right)\right)
\]
Since when \(y_{i} =G(x_{i})\),\(\exp \left(-\beta y_{i} G\left(x_{i}\right)\right)=1\);when \(y_{i} \neq G(x_{i})\),\(\exp \left(-\beta y_{i} G\left(x_{i}\right)\right)=-1\)
we can rewrite \(\sum_{i=1}^{N} w_{i}^{(m)} \exp \left(-\beta y_{i} G\left(x_{i}\right)\right)\) as:
\[e^{-\beta} \cdot \sum_{y_{i}=G\left(x_{i}\right)} w_{i}^{(m)}+e^{\beta} \cdot \sum_{y_{i} \neq G\left(x_{i}\right)} w_{i}^{(m)}
\]
which is equivalent as:
\[\left(e^{\beta}-e^{-\beta}\right) \cdot \sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G\left(x_{i}\right)\right)+e^{-\beta} \cdot \sum_{i=1}^{N} w_{i}^{(m)}
\]
Therefore, the optimization of G and w are independent:
\[G_{m}=\arg \min _{G} \sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G\left(x_{i}\right)\right)
\]
plugging this \(G_m\) into object function taking derivative to \(\beta\), we have:
\[\left(e^{\beta}+e^{-\beta}\right) \cdot \sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G_m\left(x_{i}\right)\right)=e^{-\beta} \cdot \sum_{i=1}^{N} w_{i}^{(m)}
\]
\[e^{2\beta}+1=\frac{\sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G_m\left(x_{i}\right)\right)}{\sum_{i=1}^{N} w_{i}^{(m)}}
\]
Thus:
\[\beta_{m}=\frac{1}{2} \log \frac{1-\operatorname{err}_{m}}{\operatorname{err}_{m}}
\]
where
\[\operatorname{err}_{m}=\frac{\sum_{i=1}^{N} w_{i}^{(m)} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)}{\sum_{i=1}^{N} w_{i}^{(m)}}
\]
Since we use forward stagewise additive modeling,
\[f_{m}(x)=f_{m-1}(x)+\beta_{m} G_{m}(x)
\]
we have:
\[\begin{aligned}
w_{i}^{(m+1)}&=\exp \left(-y_{i} f_{m}\left(x_{i}\right)\right)\\&=\exp(-y_i(f_{m-1}(x)+\beta_{m} G_{m}(x)))\\&=w_{i}^{(m)} \cdot e^{-\beta_{m} y_{i} G_{m}\left(x_{i}\right)}
\end{aligned}
\]
Since \(-y_{i} G_{m}\left(x_{i}\right)=2 \cdot I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)-1\), then
we can rewrite the above equation as
\[w_{i}^{(m+1)}=w_{i}^{(m)} \cdot e^{\alpha_{m} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)} \cdot e^{-\beta_{m}}
\]
Where \(\alpha_{m}=2 \beta_{m}\)
we can ignore the factors \(e^{-\beta_m}\) since it is multiplied all weights by the same value.
\[w_{i}^{(m+1)}=w_{i}^{(m)} \cdot e^{\alpha_{m} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)}
\]
also, we have:
\[\alpha_{m}=\log \frac{1-\operatorname{err}_{m}}{\operatorname{err}_{m}}
\]
Compare it with adaboost, we can find that this is the same algorithm as Adaboost.