XGBoost专题(一)

XGBoost专题(一)

安装

仅 Linux 平台支持使用多个 GPU 进行训练。仅介绍PYTHON语言

  1. 二进制包安装
pip install xgboost -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

这里使用更快更稳定的豆瓣源来安装更新

Platform GPU Multi-Node-Multi-GPU
Linux x86_64
Linux aarch64
MacOS
Windows
  1. 源码安装

第一步获取源码

git clone --recursive https://github.com/dmlc/xgboost

第二步构建共享库

  • On Linux and other UNIX-like systems, the target library is libxgboost.so
  • On MacOS, the target library is libxgboost.dylib
  • On Windows the target library is xgboost.dll

编译环境要求

  • A recent C++ compiler supporting C++11 (g++-5.0 or higher)
  • CMake 3.13 or higher.
cd xgboost
mkdir build
cd build
cmake .. -DUSE_CUDA=ON	# CUDA toolkit需要存在,如果不需要支持GPU,则cmake ..
make -j4

cmake 工具怎么安装,首先安装make,gcc,g++工具

# 然后删除本地cmake
apt-get autoremove cmake1
# 下载安装包
wget https://github.com/Kitware/CMake/releases/download/v3.21.0-rc2/cmake-3.21.0-rc2-linux-x86_64.sh
# 赋权
chmod +x cmake-3.21.0-rc2-linux-x86_64.sh
# 运行完之后,会解压已经编译好的cmake工具文件夹
# 创建软链接
sudo mv cmake-3.21.0-rc2-linux-x86_64 /opt/cmake-3.21.0
ln -sf /opt/cmake-3.21.0/bin/*  /usr/bin/
# 检查安装版本
cmake --version

第三步构建xgboost的python包

Python 包位于python-package/

第一种方法使用默认的工具链

cd xgboost/python-package
python setup.py install  # Install the XGBoost to your current Python environment.
python setup.py build    # Build the Python package.
python setup.py build_ext # Build only the C++ core.
python setup.py sdist     # Create a source distribution
python setup.py bdist     # Create a binary distribution
python setup.py bdist_wheel # Create a binary distribution with wheel format
# --use-cuda 支持GPU加速  --use-nccl支持分布式GPU
python setup.py install --use-cuda --use-nccl

setup.py有关可用选项的完整列表,请参阅。

其他方法就不介绍了,毕竟我没有亲自操作过

安装过程理解:首先编译好so库共享文件,然后安装python提供的接口(胶水语言),提供了操作so库的接口。

使用例子

其他参考链接:

  1. https://xgboost.readthedocs.io/en/latest/tutorials/input_format.html
  2. https://xgboost.readthedocs.io/en/latest/tutorials/index.html
  3. https://github.com/dmlc/xgboost/tree/master/demo
import xgboost as xgb
# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
# specify parameters via map
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' }
num_round = 2
evallist = [(dtest, 'eval'), (dtrain, 'train')]
# 好好理解下面的代码
bst = xgb.train(param, dtrain, num_round, evallist)
# 保存模型
bst.save_model('0001.model')
# 加载模型
bst = xgb.Booster({'nthread': 4})  # init model
bst.load_model('0001.model')  # load data
# make prediction
preds = bst.predict(dtest)

Python使用教程

设置参数形式

  1. dictionary形式
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}
param['nthread'] = 4
param['eval_metric'] = 'auc'
  1. 多个评估指标
param['eval_metric'] = ['auc', 'ams@0']

# alternatively:
# plst = param.items()
# dict_items([('max_depth', 2), ('eta', 1), ('objective', 'binary:logistic')])
# plst += [('eval_metric', 'ams@0')]
  1. 设置验证集 a list of pairs
evallist = [(dtest, 'eval'), (dtrain, 'train')]

最全的介绍:https://xgboost.readthedocs.io/en/latest/python/python_intro.html#setting-parameters

API doc:https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.core

posted @ 2021-07-20 15:14  小肚腩的世界  阅读(91)  评论(0编辑  收藏  举报