数据库实验 - 1. TPC-H 数据生成和导入

1. TPC-H 数据生成和导入

实验环境 PostgreSQL 12
参数 ScaleFactor = 1G, QuerySeed = 20190909

制作完成的数据和查询如下:
链接: https://pan.baidu.com/s/1-2VcQcrSZhz1yFd1Cq4m9Q 提取码: q8sj

1.1. 生成数据

参考 TPC-H数据导入postgresql教程

编辑 dbgen/makefile.suite 修改其中各部分如下

CC = gcc
DATABASE = SQLSERVER 
MACHINE = LINUX 
WORKLOAD = TPCH
$ #make clean

$ make

$ ./dbgen -s 1 -f # ScaleFactor=1(Gigabytes), Overwrite

生成如下文件

dss.ddl # 表定义
dss.ri  # 主键和外键定义

# 数据
customer.tbl
lineitem.tbl
nation.tbl
orders.tbl
partsupp.tbl
part.tbl
region.tbl
supplier.tbl

数据处理

sed -i 's/|$//g' `find *.tbl` # 去除末尾的 DELIMITER

1.2. 导入数据

先导入表定义,直接运行一遍 dss.ddl

\i /home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/dss.ddl

再导入表数据

chmod 777 *.tbl # 使 PostgreSQL 可以读文件
copy nation from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/nation.tbl' with DELIMITER as '|';
copy part from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/part.tbl' with DELIMITER as '|';
copy region from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/region.tbl' with DELIMITER as '|';
copy partsupp from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/partsupp.tbl' with DELIMITER as '|';
copy customer from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/customer.tbl' with DELIMITER as '|';
copy supplier from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/supplier.tbl' with DELIMITER as '|';
copy lineitem from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/lineitem.tbl' with DELIMITER as '|';
copy orders from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/orders.tbl' with DELIMITER as '|';

另一种导入方法(上面一种没 work,下面这种可以):

cat nation.tbl | psql -U tpch -d tpch -c "copy nation from stdin with DELIMITER as '|';"
cat part.tbl | psql -U tpch -d tpch -c "copy part from stdin with DELIMITER as '|';"
cat region.tbl | psql -U tpch -d tpch -c "copy region from stdin with DELIMITER as '|';"
cat partsupp.tbl | psql -U tpch -d tpch -c "copy partsupp from stdin with DELIMITER as '|';"
cat customer.tbl | psql -U tpch -d tpch -c "copy customer from stdin with DELIMITER as '|';"
cat supplier.tbl | psql -U tpch -d tpch -c "copy supplier from stdin with DELIMITER as '|';"
cat lineitem.tbl | psql -U tpch -d tpch -c "copy lineitem from stdin with DELIMITER as '|';"
cat orders.tbl | psql -U tpch -d tpch -c "copy orders from stdin with DELIMITER as '|';"

1.3. 添加外键

dss.ri 做相应修改如下,执行。

-- For table REGION
ALTER TABLE REGION
ADD PRIMARY KEY (R_REGIONKEY);

-- For table NATION
ALTER TABLE NATION
ADD PRIMARY KEY (N_NATIONKEY);

ALTER TABLE NATION
ADD FOREIGN KEY (N_REGIONKEY) references REGION;

COMMIT WORK;

-- For table PART
ALTER TABLE PART
ADD PRIMARY KEY (P_PARTKEY);

COMMIT WORK;

-- For table SUPPLIER
ALTER TABLE SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);

ALTER TABLE SUPPLIER
ADD FOREIGN KEY (S_NATIONKEY) references NATION;

COMMIT WORK;

-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);

COMMIT WORK;

-- For table CUSTOMER
ALTER TABLE CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);

ALTER TABLE CUSTOMER
ADD FOREIGN KEY (C_NATIONKEY) references NATION;

COMMIT WORK;

-- For table LINEITEM
ALTER TABLE LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);

COMMIT WORK;

-- For table ORDERS
ALTER TABLE ORDERS
ADD PRIMARY KEY (O_ORDERKEY);

COMMIT WORK;

-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD FOREIGN KEY (PS_SUPPKEY) references SUPPLIER;

COMMIT WORK;

ALTER TABLE PARTSUPP
ADD FOREIGN KEY (PS_PARTKEY) references PART;

COMMIT WORK;

-- For table ORDERS
ALTER TABLE ORDERS
ADD FOREIGN KEY (O_CUSTKEY) references CUSTOMER;

COMMIT WORK;

-- For table LINEITEM
ALTER TABLE LINEITEM
ADD FOREIGN KEY (L_ORDERKEY)  references ORDERS;

COMMIT WORK;

ALTER TABLE LINEITEM
ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP;

COMMIT WORK;

2. 检查结果

psql 客户端下输入 \d+ 返回:

tpch=> \d+
                         关联列表
 架构模式 |   名称   |  类型  | 拥有者 |    大小    | 描述 
----------+----------+--------+--------+------------+------
 public   | customer | 数据表 | tpch   | 28 MB      | 
 public   | lineitem | 数据表 | tpch   | 879 MB     | 
 public   | nation   | 数据表 | tpch   | 8192 bytes | 
 public   | orders   | 数据表 | tpch   | 204 MB     | 
 public   | part     | 数据表 | tpch   | 32 MB      | 
 public   | partsupp | 数据表 | tpch   | 136 MB     | 
 public   | region   | 数据表 | tpch   | 8192 bytes | 
 public   | supplier | 数据表 | tpch   | 1800 kB    | 
(8 行记录)

3. 生成查询

拷贝 dists.dssqueries/*.sql 到同一目录下,运行如下代码:

for i in {1..22}
do
    ./qgen -r 20190909 $i > query/q$i.sql # 20190909 是随机数种子
done
posted @ 2019-10-06 13:20  nlp-in-shell  阅读(1048)  评论(0编辑  收藏  举报