数据库实验 - 1. TPC-H 数据生成和导入
1. TPC-H 数据生成和导入
实验环境 PostgreSQL 12
参数 ScaleFactor = 1G, QuerySeed = 20190909
制作完成的数据和查询如下:
链接: https://pan.baidu.com/s/1-2VcQcrSZhz1yFd1Cq4m9Q 提取码: q8sj
1.1. 生成数据
编辑 dbgen/makefile.suite
修改其中各部分如下
CC = gcc
DATABASE = SQLSERVER
MACHINE = LINUX
WORKLOAD = TPCH
$ #make clean
$ make
$ ./dbgen -s 1 -f # ScaleFactor=1(Gigabytes), Overwrite
生成如下文件
dss.ddl # 表定义
dss.ri # 主键和外键定义
# 数据
customer.tbl
lineitem.tbl
nation.tbl
orders.tbl
partsupp.tbl
part.tbl
region.tbl
supplier.tbl
数据处理
sed -i 's/|$//g' `find *.tbl` # 去除末尾的 DELIMITER
1.2. 导入数据
先导入表定义,直接运行一遍 dss.ddl
\i /home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/dss.ddl
再导入表数据
chmod 777 *.tbl # 使 PostgreSQL 可以读文件
copy nation from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/nation.tbl' with DELIMITER as '|';
copy part from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/part.tbl' with DELIMITER as '|';
copy region from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/region.tbl' with DELIMITER as '|';
copy partsupp from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/partsupp.tbl' with DELIMITER as '|';
copy customer from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/customer.tbl' with DELIMITER as '|';
copy supplier from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/supplier.tbl' with DELIMITER as '|';
copy lineitem from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/lineitem.tbl' with DELIMITER as '|';
copy orders from '/home/monkey/Research/DBAcc/TPCH/2.18.0_rc2/dbgen/orders.tbl' with DELIMITER as '|';
另一种导入方法(上面一种没 work,下面这种可以):
cat nation.tbl | psql -U tpch -d tpch -c "copy nation from stdin with DELIMITER as '|';"
cat part.tbl | psql -U tpch -d tpch -c "copy part from stdin with DELIMITER as '|';"
cat region.tbl | psql -U tpch -d tpch -c "copy region from stdin with DELIMITER as '|';"
cat partsupp.tbl | psql -U tpch -d tpch -c "copy partsupp from stdin with DELIMITER as '|';"
cat customer.tbl | psql -U tpch -d tpch -c "copy customer from stdin with DELIMITER as '|';"
cat supplier.tbl | psql -U tpch -d tpch -c "copy supplier from stdin with DELIMITER as '|';"
cat lineitem.tbl | psql -U tpch -d tpch -c "copy lineitem from stdin with DELIMITER as '|';"
cat orders.tbl | psql -U tpch -d tpch -c "copy orders from stdin with DELIMITER as '|';"
1.3. 添加外键
对 dss.ri
做相应修改如下,执行。
-- For table REGION
ALTER TABLE REGION
ADD PRIMARY KEY (R_REGIONKEY);
-- For table NATION
ALTER TABLE NATION
ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE NATION
ADD FOREIGN KEY (N_REGIONKEY) references REGION;
COMMIT WORK;
-- For table PART
ALTER TABLE PART
ADD PRIMARY KEY (P_PARTKEY);
COMMIT WORK;
-- For table SUPPLIER
ALTER TABLE SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE SUPPLIER
ADD FOREIGN KEY (S_NATIONKEY) references NATION;
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;
-- For table CUSTOMER
ALTER TABLE CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE CUSTOMER
ADD FOREIGN KEY (C_NATIONKEY) references NATION;
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE ORDERS
ADD PRIMARY KEY (O_ORDERKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD FOREIGN KEY (PS_SUPPKEY) references SUPPLIER;
COMMIT WORK;
ALTER TABLE PARTSUPP
ADD FOREIGN KEY (PS_PARTKEY) references PART;
COMMIT WORK;
-- For table ORDERS
ALTER TABLE ORDERS
ADD FOREIGN KEY (O_CUSTKEY) references CUSTOMER;
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE LINEITEM
ADD FOREIGN KEY (L_ORDERKEY) references ORDERS;
COMMIT WORK;
ALTER TABLE LINEITEM
ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP;
COMMIT WORK;
2. 检查结果
psql 客户端下输入 \d+
返回:
tpch=> \d+
关联列表
架构模式 | 名称 | 类型 | 拥有者 | 大小 | 描述
----------+----------+--------+--------+------------+------
public | customer | 数据表 | tpch | 28 MB |
public | lineitem | 数据表 | tpch | 879 MB |
public | nation | 数据表 | tpch | 8192 bytes |
public | orders | 数据表 | tpch | 204 MB |
public | part | 数据表 | tpch | 32 MB |
public | partsupp | 数据表 | tpch | 136 MB |
public | region | 数据表 | tpch | 8192 bytes |
public | supplier | 数据表 | tpch | 1800 kB |
(8 行记录)
3. 生成查询
拷贝 dists.dss
和 queries/*.sql
到同一目录下,运行如下代码:
for i in {1..22}
do
./qgen -r 20190909 $i > query/q$i.sql # 20190909 是随机数种子
done