TPC-DS数据压测

前言

TPC-DS是一套决策支持系统测试基准,主要针对零售行业。提供99个SQL查询(SQL99或2003),分析数据量大,测试数据与实际商业数据高度相似,同时具有各种业务模型(分析报告型,数据挖掘型等等)。

使用说明

1、下载工具及安装

注意:必须输入邮箱,他会发下载地址到邮箱中,点击下载地址即可下载,以上2个地址任选一即可。

2、编译

进入TPC-DS工具包所在目录,由于下载的是源码,需要编译后才能使用。进入/TPC-DS/tools文件夹中

  • Ubuntu:
    sudo apt-get install gcc make flex bison byacc git
  • CentOS/RHEL:
    sudo yum install gcc make flex bison byacc git
  • macOS:
    make OS=MACOS

3、生成数据

在tools目录下使用./dsdgen生成数据, 执行命令参数如下:

DIR:数据存放目录。
SCALE:数据量,以GB为单位。
TABLE:生成哪张表的数据,一共有24张表。
PARALLEL:生成的数据一共分为多少份,一般生成TB级数据才会用到。
CHILD:当前数据是第几份,与PARALLEL配对使用。
FORCE:强制写入数据。

示例1: 生成1G数据,存放在/TPC-DS/data文件夹下
./dsdgen -scale 1 -dir ../data/
示例2: 生成1TB数据,存放在/TPC-DS/data文件夹下
./dsdgen -scale 1000 -dir ../data/
示例3: 生成30TB数据,存放在/TPC-DS/data文件夹下
./dsdgen -scale 30000 -dir ../data/
示例4: 指定数据表名,生成1G数据,存放在/TPC-DS/data文件夹下
./dsdgen -SCALE 1 -DISTRIBUTIONS tpcds.idx -TERMINATE N -TABLE web_sales -dir ../data/
示例5: 分块生成1G数据,存放在/TPC-DS/data文件夹下,效率更高
./dsdgen -scale 1 -dir ../data/ -parallel 4 -child 1

4、建表语句

/tpcds-kit/tools目录下,tpcds.sql、tpcds_ri.sql
很多数据平台可能不能直接使用,需要修改。建表语句的修改主要是依据不同环境支持的数据类型修改和一些基础语法修正,还需依照生成的数据的分割符在建表时指定分隔符。

表名 header
CALL_CENTER cell
CATALOG_PAGE cell
CATALOG_RETURNS cell
CATALOG_SALES cell
CUSTOMER cell
CUSTOMER_ADDRESS cell
CUSTOMER_DEMOGRAPHICS cell
DATE_DIM cell
DBGEN_VERSION cell
HOUSEHOLD_DEMOGRAPHICS cell
INCOME_BAND cell
INVENTORY cell
ITEM cell
PROMOTION cell
REASON cell
SHIP_MODE cell
STORE cell
STORE_RETURNS cell
STORE_SALES cell
TIME_DIM cell
WAREHOUSE cell
WEB_PAGE cell
WEB_RETURNS cell
WEB_SALES cell
WEB_SITE cell

5、将生成数据导入

copy call_center from '/part2/tpcds/v2.6.0/datas/handled/call_center.dat' with delimiter as '|' NULL '';
copy catalog_page from '/part2/tpcds/v2.6.0/datas/handled/catalog_page.dat' with delimiter as '|' NULL '';
copy catalog_returns from '/part2/tpcds/v2.6.0/datas/handled/catalog_returns.dat' with delimiter as '|' NULL '';
copy catalog_sales from '/part2/tpcds/v2.6.0/datas/handled/catalog_sales.dat' with delimiter as '|' NULL '';
copy customer from '/part2/tpcds/v2.6.0/datas/handled/customer.dat' with delimiter as '|' NULL '';
copy customer_address from '/part2/tpcds/v2.6.0/datas/handled/customer_address.dat' with delimiter as '|' NULL '';
copy customer_demographics from '/part2/tpcds/v2.6.0/datas/handled/customer_demographics.dat' with delimiter as '|' NULL '';
copy date_dim from '/part2/tpcds/v2.6.0/datas/handled/date_dim.dat' with delimiter as '|' NULL '';
copy dbgen_version from '/part2/tpcds/v2.6.0/datas/handled/dbgen_version.dat' with delimiter as '|' NULL '';
copy household_demographics from '/part2/tpcds/v2.6.0/datas/handled/household_demographics.dat' with delimiter as '|' NULL '';
copy income_band from '/part2/tpcds/v2.6.0/datas/handled/income_band.dat' with delimiter as '|' NULL '';
copy inventory from '/part2/tpcds/v2.6.0/datas/handled/inventory.dat' with delimiter as '|' NULL '';
copy item from '/part2/tpcds/v2.6.0/datas/handled/item.dat' with delimiter as '|' NULL '';
copy promotion from '/part2/tpcds/v2.6.0/datas/handled/promotion.dat' with delimiter as '|' NULL '';
copy reason from '/part2/tpcds/v2.6.0/datas/handled/reason.dat' with delimiter as '|' NULL '';
copy ship_mode from '/part2/tpcds/v2.6.0/datas/handled/ship_mode.dat' with delimiter as '|' NULL '';
copy store from '/part2/tpcds/v2.6.0/datas/handled/store.dat' with delimiter as '|' NULL '';
copy store_returns from '/part2/tpcds/v2.6.0/datas/handled/store_returns.dat' with delimiter as '|' NULL '';
copy store_sales from '/part2/tpcds/v2.6.0/datas/handled/store_sales.dat' with delimiter as '|' NULL '';
copy time_dim from '/part2/tpcds/v2.6.0/datas/handled/time_dim.dat' with delimiter as '|' NULL '';
copy warehouse from '/part2/tpcds/v2.6.0/datas/handled/warehouse.dat' with delimiter as '|' NULL '';
copy web_page from '/part2/tpcds/v2.6.0/datas/handled/web_page.dat' with delimiter as '|' NULL '';
copy web_returns from '/part2/tpcds/v2.6.0/datas/handled/web_returns.dat' with delimiter as '|' NULL '';
copy web_sales from '/part2/tpcds/v2.6.0/datas/handled/web_sales.dat' with delimiter as '|' NULL '';
copy web_site from '/part2/tpcds/v2.6.0/datas/handled/web_site.dat' with delimiter as '|' NULL '';

6、生成查询SQL

进入/tpcds-kit/tools目录下,常用参数:

-input 输入,读取测试用例包含的模板,一般使用/query_templates/templates.lst即可。
-directory 模板所在目录, 一般使用-directory../query_templates即可。
-dialect 生成某个数据库的语言,可选项可以查看/query_templates目录,有oracle、db2、SqlServer等。

执行以下shell命令

for id in `seq 1 99`; do ./dsqgen -DIRECTORY ../query_templates -TEMPLATE "query$id.tpl" -DIALECT oracle -FILTER Y > ./sql/"query$id.sql"; done

示例SQL-1:

with customer_total_return as
(select sr_customer_sk as ctr_customer_sk
,sr_store_sk as ctr_store_sk
,sum(SR_FEE) as ctr_total_return
from store_returns
,date_dim
where sr_returned_date_sk = d_date_sk
and d_year =2000
group by sr_customer_sk
,sr_store_sk)
 select  c_customer_id
from customer_total_return ctr1
,store
,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
from customer_total_return ctr2
where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
and s_store_sk = ctr1.ctr_store_sk
and s_state = 'TN'
and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;
posted @ 2022-01-07 10:03  小喜_ww  阅读(1657)  评论(0编辑  收藏  举报