XGBoost

一、XGBoost算法的原理

1 最终模型形式:

  XGBoost和GBDT、AdaBoost使用的最终模型形式相同,其中m为第m轮得到的弱模型:

$$y_{x}=\sum_{m}^{M}\hat{f}_{m}(x)$$ 

2 目标函数

$$L^{(m)}=\sum_{i}^{N}l(y_{i},\hat{y}_{i}^{m-1}+\hat{f}_{m}(x_{i}))+\Omega(f_{m})+C$$

 

T为叶子节点个数,$\omega$为叶子节点权重,则正则项为:

$$\Omega(f_{m})=\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}$$

3 对目标函数(损失)进行二阶泰勒展开

二阶泰勒展开:$$f(x+\Delta x)\approx f(x)+f{}'(x)\Delta x+\frac{1}{2}f{}''(x)\Delta x^{2}$$

令:

$$g_{i}=\partial_{\hat{y}^{m-1}}l(y_{i},\hat{y}^{m-1})$$

$$h_{i}=\partial_{\hat{y}^{m-1}}^{2}l(y_{i},\hat{y}^{m-1})$$

因此:$$L^{(m)}=\sum_{i}^{N}l(y_{i},\hat{y}_{i}^{m-1}+\hat{f}_{m}(x_{i}))+\Omega(f_{m})+C\\
\approx \sum_{i}^{N}[l(y_{i},\hat{y}_{i}^{m-1})+g_{i}\hat{f}_{m}(x_{i})+\frac{1}{2}h_{i}\hat{f}_{m}^{2}(x_{i})]+\Omega(f_{m})+C$$

 

因为$l(y_{i},\hat{y}_{i}^{m-1})$为常数项,因此目标函数又可以表示为:

 

$$Obj^{m} \approx \sum_{i}^{N}[g_{i}\hat{f}_{m}(x_{i})+\frac{1}{2}h_{i}\hat{f}_{m}^{2}(x_{i})]+\Omega(f_{m})\\$$

$$\approx \sum_{i}^{N}[g_{i}\omega_{q(x_{i})}+\frac{1}{2}h_{i}\omega_{q(x_{i})}^{2}]+\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}\\
\approx \sum_{j}^{T}[\sum_{i\in I_{j}}^{}g_{i}\omega_{j}+\frac{1}{2}\sum_{i\in I_{j}}^{}h_{i}\omega_{j}^{2}]+\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}\\
\approx \sum_{j}^{T}[G_{j}\omega_{j}+\frac{1}{2}H_{j}\omega_{j}^{2}]+\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}\\
\approx \sum_{j}^{J}[G_{j}\omega_{j}+\frac{1}{2}(H_{j}+\lambda)\omega_{j}^{2})]+\gamma T $$

 

其中:

$$G_{j}=\sum_{i\in I_{j}}^{}g_{i}$$

$$H_{j}=\sum_{i\in I_{j}}^{}h_{i}$$

 

4 令目标函数对叶子节点权重$\omega_{j}求偏导$,并令其等于0,得叶子结点的权重为:

$$\omega_{j}=-\frac{G_{i}}{H_{j}+\lambda}$$

 

5 把极值点带入目标函数,得目标函数

$$Obj^{m}=-\frac{1}{2}\sum_{j}^{J}\frac{G_{j}^{2}}{H_{j}+\lambda}+\gamma T$$

 

6 分裂点的选取(无下标的G、H为不分割时候的值)

$$L_{split}=\frac{1}{2}[\frac{G_{L}^{2}}{H_{L}+\lambda}+\frac{G_{R}^{2}}{H_{R}+\lambda}-\frac{G^{2}}{H+\lambda}]$$

 

posted @ 2020-07-26 22:47  ylxn  阅读(184)  评论(0编辑  收藏  举报