本文根据 StarRocks 官网 TPC-H 基准测试 操作
准备
硬件
项目 | 内容 |
---|---|
机器 | 3 台华为云服务器 |
CPU | 16 core |
内存 | 64GB |
网络 | 1Gbits/s |
磁盘 | 高效云盘 200GB |
软件
内核版本:Linux 3.10.0-1160.59.1.el7.x86_64
操作系统版本:CentOS Linux release 7.9.2009 (Core)
软件版本:StarRocks 2.5.1,
- 说明: 1 FE + 3 BE (混部)
部署 StarRocks
具体部署步骤略
架构
1Fe + 3 BE
执行 ddl
执行 sql 文件: sql/tpch/ddl_100/tpch_create.sql 创建 tpch 表
测试数据
StarRocks 官网获取 starrokcs tpch 测试 工具包
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/tpch-poc-0.1.2.zip
unzip tpch-poc-0.1.2.zip
cd tpch-poc-0.1.2
cd benchmark
生成数据
- 就是玩一下,先生成 10 G 数据看看
[root@ecs-0003 benchmark]# ./bin/gen_data/gen-tpch.sh 10 data_100
[INFO] gen 10GB data under /root/tpch-poc-0.1.2/benchmark/data_100
[INFO] generate data...
[INFO] gen data of table: customer
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: lineitem
[INFO] gen <1>th part data of table: lineitem
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen <2>th part data of table: lineitem
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: nation
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: orders
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: parts
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: partsupp
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: region
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: suppliers
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] refine the data in /root/tpch-poc-0.1.2/benchmark/data_100
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl
233M /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
3.6G /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
3.6G /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
4.0K /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
1.7G /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
1.2G /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
231M /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
4.0K /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
14M /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl
导入数据
[root@ecs-0003 benchmark]# python3 src/db_table_operation.py stream_load data_100
[INFO] 2023-02-16 14:54:16 db_table_operation.py[101] stream load from dir:/root/tpch-poc-0.1.2/benchmark/data_100
[INFO] 2023-02-16 14:54:16 config_util.py[43] concurrency load number for table: customer is not set, use 1 by default.
[INFO] 2023-02-16 14:54:16 db_table_operation.py[25] stream load start. table: customer, path: /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
[INFO] 2023-02-16 14:54:16 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl -H "column_separator:|" -H "columns:C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT" http://10.201.0.198:18030/api/tpch/customer/_stream_load}
[INFO] 2023-02-16 14:54:18 db_table_operation.py[41] stream load success. table: customer, path: /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
[INFO] 2023-02-16 14:54:18 config_util.py[41] concurrency load number for table: lineitem is 10.
[INFO] 2023-02-16 14:54:18 db_table_operation.py[25] stream load start. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
[INFO] 2023-02-16 14:54:18 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1 -H "column_separator:|" -H "columns:l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment" http://10.201.0.198:18030/api/tpch/lineitem/_stream_load}
[INFO] 2023-02-16 14:54:18 db_table_operation.py[25] stream load start. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
[INFO] 2023-02-16 14:54:18 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2 -H "column_separator:|" -H "columns:l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment" http://10.201.0.198:18030/api/tpch/lineitem/_stream_load}
[INFO] 2023-02-16 14:54:52 db_table_operation.py[41] stream load success. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
[INFO] 2023-02-16 14:54:53 db_table_operation.py[41] stream load success. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
[INFO] 2023-02-16 14:54:53 config_util.py[41] concurrency load number for table: orders is 5.
[INFO] 2023-02-16 14:54:53 db_table_operation.py[25] stream load start. table: orders, path: /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
[INFO] 2023-02-16 14:54:53 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl -H "column_separator:|" -H "columns:o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment" http://10.201.0.198:18030/api/tpch/orders/_stream_load}
[INFO] 2023-02-16 14:55:07 db_table_operation.py[41] stream load success. table: orders, path: /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
[INFO] 2023-02-16 14:55:07 config_util.py[43] concurrency load number for table: part is not set, use 1 by default.
[INFO] 2023-02-16 14:55:07 db_table_operation.py[25] stream load start. table: part, path: /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
[INFO] 2023-02-16 14:55:07 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl -H "column_separator:|" -H "columns:p_partkey,p_name,p_mfgr,p_brand,p_type,p_size,p_container,p_retailprice,p_comment" http://10.201.0.198:18030/api/tpch/part/_stream_load}
[INFO] 2023-02-16 14:55:09 db_table_operation.py[41] stream load success. table: part, path: /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
[INFO] 2023-02-16 14:55:09 config_util.py[43] concurrency load number for table: region is not set, use 1 by default.
[INFO] 2023-02-16 14:55:09 db_table_operation.py[25] stream load start. table: region, path: /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
[INFO] 2023-02-16 14:55:09 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl -H "column_separator:|" http://10.201.0.198:18030/api/tpch/region/_stream_load}
[INFO] 2023-02-16 14:55:09 db_table_operation.py[41] stream load success. table: region, path: /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
[INFO] 2023-02-16 14:55:09 config_util.py[43] concurrency load number for table: nation is not set, use 1 by default.
[INFO] 2023-02-16 14:55:09 db_table_operation.py[25] stream load start. table: nation, path: /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
[INFO] 2023-02-16 14:55:09 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl -H "column_separator:|" http://10.201.0.198:18030/api/tpch/nation/_stream_load}
[INFO] 2023-02-16 14:55:09 db_table_operation.py[41] stream load success. table: nation, path: /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
[INFO] 2023-02-16 14:55:09 config_util.py[43] concurrency load number for table: partsupp is not set, use 1 by default.
[INFO] 2023-02-16 14:55:09 db_table_operation.py[25] stream load start. table: partsupp, path: /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
[INFO] 2023-02-16 14:55:09 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl -H "column_separator:|" -H "columns:ps_partkey,ps_suppkey,ps_availqty,ps_supplycost,ps_comment" http://10.201.0.198:18030/api/tpch/partsupp/_stream_load}
[INFO] 2023-02-16 14:55:16 db_table_operation.py[41] stream load success. table: partsupp, path: /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
[INFO] 2023-02-16 14:55:16 config_util.py[43] concurrency load number for table: supplier is not set, use 1 by default.
[INFO] 2023-02-16 14:55:16 db_table_operation.py[25] stream load start. table: supplier, path: /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl
[INFO] 2023-02-16 14:55:16 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl -H "column_separator:|" -H "columns:S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT" http://10.201.0.198:18030/api/tpch/supplier/_stream_load}
[INFO] 2023-02-16 14:55:16 db_table_operation.py[41] stream load success. table: supplier, path: /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl
查看导入结果
执行 测试
10G 数据
[root@ecs-0003 benchmark]# python3 src/benchmark.py -p -d tpch
[INFO] 2023-02-16 15:48:09 benchmark.py[217] benchmark args:Namespace(check_result=False, dataset='tpch', log_quiet=False, log_verbose=False, performance=True, scale=100, sql_file='')
[INFO] 2023-02-16 15:48:09 benchmark.py[97] test sql in dirs:[tpch]
[INFO] 2023-02-16 15:48:09 starrocks_lib.py[266] get sql info from sql_dir:/root/tpch-poc-0.1.2/benchmark/sql/tpch/query/tpch
------ dataset: tpch, concurrency: 1 ------
sql\time(ms)\parallel_num 8
q1 378.0
q2 157.0
q3 95.0
q4 77.0
q5 186.0
q6 41.0
q7 192.0
q8 195.0
q9 336.0
q10 96.0
q11 81.0
q12 84.0
q13 208.0
q14 56.0
q15 89.0
q16 69.0
q17 84.0
q18 369.0
q19 77.0
q20 84.0
q21 342.0
q22 110.0
总耗时: 3.4s
- hive 外表的 TPC-H 测试也跑了一遍,耗时 24s,比原生的 Hive 还是要快太多了
欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文