docker run sqlfow xgboost demo
Run SQLFlow Using Docker
SQLFlow releases several Docker images that contains the SQLFlow server, MySQL server, sample datasets, Jupyter Notebook server, and the SQLFlow plugin for Jupyter.
You can use these Docker images for either local trying out or production deployment.
Preparation
-
Install Docker Community Edition on your PC/Macbook/Server.
-
Pull the latest SQLFlow Docker images. Or you can also build the Docker image from source code following this guide.
docker pull sqlflow/sqlflow docker pull sqlflow/sqlflow:mysql docker pull sqlflow/sqlflow:jupyter
Try Out SQLFlow Using Notebook
-
Type the below command to start three containers to start a MySQL server, SQLFlow server and a Jupyter notebook server.
docker run --name=sqlflow-mysql -d -p 8888:8888 sqlflow/sqlflow:mysql docker run --net=container:sqlflow-mysql -d sqlflow/sqlflow:latest sqlflowserver docker run --net=container:sqlflow-mysql -d sqlflow/sqlflow:jupyter
-
You can also use a specified version (e.g.
v0.4.0
) of the SQLFlow server by changing the second line above todocker run --net=container:sqlflow-mysql -d sqlflow/sqlflow:v0.4.0 sqlflowserver
. -
Open a web browser, go to
localhost:8888
, open any tutorial notebook likeiris-dnn.ipynb
file, then you can follow the tutorial and run the SQL statements to run the training and prediction.
XGBoost on SQLFlow Tutorial
This is a tutorial on train/predict XGBoost model in SQLFLow, you can find more SQLFlow usage from the Language Guide, in this tutorial you will learn how to:
- Train a XGBoost model to fit the boston housing dataset; and
- Predict the housing price using the trained model;
The Dataset
This tutorial would use the Boston Housing as the demonstration dataset. The database contains 506 lines and 14 columns, the meaning of each column is as follows:
Column | Explain |
---|---|
crim | per capita crime rate by town. |
zn | proportion of residential land zoned for lots over 25,000 sq.ft. |
indus | proportion of non-retail business acres per town. |
chas | Charles River dummy variable (= 1 if tract bounds river; 0 otherwise). |
nox | nitrogen oxides concentration (parts per 10 million). |
rm | average number of rooms per dwelling. |
age | proportion of owner-occupied units built prior to 1940. |
dis | weighted mean of distances to five Boston employment centres. |
rad | index of accessibility to radial highways. |
tax | full-value property-tax rate per $10,000. |
ptratio | pupil-teacher ratio by town. |
black | 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town. |
lstat | lower status of the population (percent). |
medv | median value of owner-occupied homes in $1000s. |
We separated the dataset into train/test dataset, which is used to train/predict our model. SQLFlow would automatically split the training dataset into train/validation dataset while training progress.
%%sqlflow
describe boston.train;
%%sqlflow
describe boston.test;
Fit Boston Housing Dataset
First, let's train an XGBoost regression model to fit the boston housing dataset, we prefer to train the model for 30 rounds
, and using squarederror
loss function that the SQLFLow extended SQL can be like:
TO TRAIN xgboost.gbtree
WITH
train.num_boost_round=30,
objective="reg:squarederror"
xgboost.gbtree
is the estimator name, gbtree
is one of the XGBoost booster, you can find more information from here.
We can specify the training data columns in COLUMN clause
, and the label by LABEL
keyword:
COLUMN crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat
LABEL medv
To save the trained model, we can use INTO clause
to specify a model name:
INTO sqlflow_models.my_xgb_regression_model
Second, let's use a standard SQL to fetch the training data from table boston.train
:
SELECT * FROM boston.train
Finally, the following is the SQLFlow Train statement of this regression task, you can run it in the cell:
%%sqlflow
SELECT * FROM boston.train
TO TRAIN xgboost.gbtree
WITH
objective="reg:squarederror",
train.num_boost_round = 30
COLUMN crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat
LABEL medv
INTO sqlflow_models.my_xgb_regression_model;
Predict the Housing Price
After training the regression model, let's predict the house price using the trained model.
First, we can specify the trained model by USING clause
:
USING sqlflow_models.my_xgb_regression_model
Than, we can specify the prediction result table by TO PREDICT clause
:
TO PREDICT boston.predict.medv
And using a standard SQL to fetch the prediction data:
SELECT * FROM boston.test
Finally, the following is the SQLFLow Prediction statement:
%%sqlflow
SELECT * FROM boston.test
TO PREDICT boston.predict.medv
USING sqlflow_models.my_xgb_regression_model;
Let's have a glance at prediction results.
%%sqlflow
SELECT * FROM boston.predict;
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
2017-09-28 机器学习样本标记 示意代码
2017-09-28 kibana 5.5 源码编译踩坑记录