XGBoost
一、XGBoost算法的原理
1 最终模型形式:
XGBoost和GBDT、AdaBoost使用的最终模型形式相同,其中m为第m轮得到的弱模型:
$$y_{x}=\sum_{m}^{M}\hat{f}_{m}(x)$$
2 目标函数
$$L^{(m)}=\sum_{i}^{N}l(y_{i},\hat{y}_{i}^{m-1}+\hat{f}_{m}(x_{i}))+\Omega(f_{m})+C$$
T为叶子节点个数,$\omega$为叶子节点权重,则正则项为:
$$\Omega(f_{m})=\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}$$
3 对目标函数(损失)进行二阶泰勒展开
二阶泰勒展开:$$f(x+\Delta x)\approx f(x)+f{}'(x)\Delta x+\frac{1}{2}f{}''(x)\Delta x^{2}$$
令:
$$g_{i}=\partial_{\hat{y}^{m-1}}l(y_{i},\hat{y}^{m-1})$$
$$h_{i}=\partial_{\hat{y}^{m-1}}^{2}l(y_{i},\hat{y}^{m-1})$$
因此:$$L^{(m)}=\sum_{i}^{N}l(y_{i},\hat{y}_{i}^{m-1}+\hat{f}_{m}(x_{i}))+\Omega(f_{m})+C\\
\approx \sum_{i}^{N}[l(y_{i},\hat{y}_{i}^{m-1})+g_{i}\hat{f}_{m}(x_{i})+\frac{1}{2}h_{i}\hat{f}_{m}^{2}(x_{i})]+\Omega(f_{m})+C$$
因为$l(y_{i},\hat{y}_{i}^{m-1})$为常数项,因此目标函数又可以表示为:
$$Obj^{m} \approx \sum_{i}^{N}[g_{i}\hat{f}_{m}(x_{i})+\frac{1}{2}h_{i}\hat{f}_{m}^{2}(x_{i})]+\Omega(f_{m})\\$$
$$\approx \sum_{i}^{N}[g_{i}\omega_{q(x_{i})}+\frac{1}{2}h_{i}\omega_{q(x_{i})}^{2}]+\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}\\
\approx \sum_{j}^{T}[\sum_{i\in I_{j}}^{}g_{i}\omega_{j}+\frac{1}{2}\sum_{i\in I_{j}}^{}h_{i}\omega_{j}^{2}]+\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}\\
\approx \sum_{j}^{T}[G_{j}\omega_{j}+\frac{1}{2}H_{j}\omega_{j}^{2}]+\gamma T+\frac{1}{2}\lambda\sum_{j}^{T}\omega_{j}^{2}\\
\approx \sum_{j}^{J}[G_{j}\omega_{j}+\frac{1}{2}(H_{j}+\lambda)\omega_{j}^{2})]+\gamma T $$
其中:
$$G_{j}=\sum_{i\in I_{j}}^{}g_{i}$$
$$H_{j}=\sum_{i\in I_{j}}^{}h_{i}$$
4 令目标函数对叶子节点权重$\omega_{j}求偏导$,并令其等于0,得叶子结点的权重为:
$$\omega_{j}=-\frac{G_{i}}{H_{j}+\lambda}$$
5 把极值点带入目标函数,得目标函数
$$Obj^{m}=-\frac{1}{2}\sum_{j}^{J}\frac{G_{j}^{2}}{H_{j}+\lambda}+\gamma T$$
6 分裂点的选取(无下标的G、H为不分割时候的值)
$$L_{split}=\frac{1}{2}[\frac{G_{L}^{2}}{H_{L}+\lambda}+\frac{G_{R}^{2}}{H_{R}+\lambda}-\frac{G^{2}}{H+\lambda}]$$