postgresql/lightdb大数据量导入导出之copy/ltuldr/ltldr/lt_bulkload及最佳实践推荐
注:无论是copy to还是copy from,都有一个缺点,要求文件位于服务器上,这就限制了使用范围。为此,对于导出,lightdb提供了高性能导出版本ltuldr。对于导入,lightdb在23.1之前提供lt_bulkload,见下文;从23.1开始,支持和oracle sql*loader对应的ltldr。
copy可用于快速导入和导出数据,主要用途如下:
- The COPY command moves data between PostgreSQL tables and standard file system files.
- COPY TO copies the contents of the table to the file.
- COPY TO can also copy the results of the SELECT query. That is, if the column list is specified, COPY TO only copies the data in the specified columns to the file.
- The COPY command instructs the PostgreSQL server to read from or write to the file directly. Therefore, the file must be accessible to the PostgreSQL user.
- COPY FROM copies the data from the file to the table.
- When using the COPY FROM command, each field in the file is inserted sequentially to the specified column. Table columns not specified in the COPY FROM column list get their default values.
- It is necessary to grant SELECT privilege on the table read by COPY TO, and the INSERT privilege in the table where the values are inserted with COPY FROM.
- COPY TO can only be used with tables, not views. However, if we want to copy the contents of the view, we must feed the COPY command with the sql query.
1
|
( COPY (SELECT * FROM country) TO ‘list_countries.copy’;).
|
- Files named in the COPY command are read or written directly by the server, not by the client application. Therefore, it must be located on or accessible to the database server machine, not the client either.
- We shouldn’t confuse COPY with \copy in psql. \copy calls COPY FROM STDIN or COPY TO STDOUT and then retrieves and stores the data from a file accessible by the psql client. Therefore, file accessibility and access rights depend on the client rather than the server when using \copy.
copy典型用法:
-- 来自控制台,lt_restore的做法 zjh@postgres=# copy big_table from stdin(delimiter ' '); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 abc >> 2 bcd >> \. COPY 2 -- 来自程序,通常用于gzip直接压缩,不经过中间文件 zjh@postgres=# copy big_table from program 'echo -e "1 abc\\n2 bcsfws"' (delimiter ' '); COPY 2 zjh@postgres=# select * from big_table ; id | v ----+-------- 1 | abc 2 | bcd 1 | abc 1 | abc 2 | bcsfws (5 rows)
copy性能也很高:
zjh@postgres=# \timing on Timing is on. zjh@postgres=# COPY big_table TO '/home/zjh/big_table.csv' with csv ; COPY 110000001 Time: 54152.162 ms (00:54.152) zjh@postgres=# truncate table big_table ; TRUNCATE TABLE Time: 3267.019 ms (00:03.267) zjh@postgres=# COPY big_table from '/home/zjh/big_table.csv' with csv ; COPY 110000001 Time: 125007.676 ms (02:05.008) zjh@postgres=# truncate table big_table zjh@postgres-# ; TRUNCATE TABLE Time: 2728.090 ms (00:02.728) zjh@postgres=# COPY big_table from '/home/zjh/big_table.out' with binary; COPY 110000001 Time: 120169.918 ms (02:00.170)
copy from也支持多进程并行,两个进程提升50%左右:
-- 会话1 zjh@postgres=# COPY big_table from '/home/zjh/big_table.out' with binary; COPY 110000001 Time: 182382.390 ms (03:02.382) -- 会话2,几乎并行开始 zjh@postgres=# COPY big_table FROM '/home/zjh/big_table.out' with binary; COPY 110000001 Time: 182251.170 ms (03:02.251)
裸文件13GB左右(12011111262 Apr 8 13:03 big_table.csv,看数据类型,CSV更小一些,一般整型多的话,二进制更合适,字符串多的话,文本更合适),-rw-r--r-- 1 zjh zjh 13375555719 Apr 8 12:50 big_table.out。
zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 14 GB (1 row)
每秒平均147MB,110 0000行。
copy from除了直接导入数据外,还能包含where子句进行数据过滤,如下:
zjh@postgres=# create table big_table(id int, v varchar(100)); CREATE TABLE zjh@postgres=# insert into big_table select i, i || '-' || i from generate_series(1,100000) i; INSERT 0 100000 zjh@postgres=# exit [zjh@hs-10-20-30-193 ~]$ ltsql -p14000 postgres ltsql (13.3-22.2) Type "help" for help. zjh@postgres=# \copy ( select * from big_table) to '/home/zjh/big_table.csv' delimiter '|' csv header; COPY 100000 zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99; COPY 99000 zjh@postgres=# \timing on Timing is on. zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99; COPY 99000 Time: 44.725 ms zjh@postgres=#
zjh@postgres=# \timing on Timing is on. zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99; COPY 99000 Time: 45.757 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true); COPY 100000 Time: 41.660 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true); COPY 100000 Time: 41.146 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99; COPY 99000 Time: 42.736 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99; COPY 99000 Time: 42.753 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99; COPY 99000 Time: 42.475 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v like '99%'; COPY 1111 Time: 28.995 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%'; COPY 98889 Time: 43.442 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%'; COPY 98889 Time: 40.352 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%'; COPY 98889 Time: 60.274 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%'; -- 不同函数对性能影响很大 COPY 98889 Time: 61.503 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%'; -- 不同函数对性能影响大 COPY 98889 Time: 53.131 ms zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%'; COPY 98889 Time: 49.655 ms zjh@postgres=#
pg_bulkload是没有该特性的。如下:
其filter函数在记录不满足要求的时候返回null记录,而不是丢弃该记录,所以并不符合该语义。
无论是copy还是pg_bulkload,相比oracle sql loader,都存在很大的不足,常见的特性包括如下:
- 不支持基于绝对位置的分隔符
- pg_bulkload不支持过滤记录
- pg_bulkload/copy都不支持对字段调用函数
- 不支持replace
- copy不支持parallel/direct等特性
- 不支持指定列的默认值
- 不支持缓冲大小优化等
- 不支持目标表是分区表
lightdb 23.1中完整的支持上述特性,通过ltldr提供。
当前版本可以通过insert values多值实现该特性。如下:
insert into big_table select * from (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase) where depno = 1;
需要注意,pg不支持超过1000个绑定变量值,这极大地限制了insert values多值的价值,甚至不如jdbc batch的优化。
http://www.light-pg.com/docs/lightdb/13.3-22.2/file-fdw.html
http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-values.html
https://ossc-db.github.io/pg_bulkload/pg_bulkload.html
http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-copy.html
https://wiki.postgresql.org/wiki/COPY
http://www.light-pg.com/docs/lightdb/current/sql-copy.html
https://blog.csdn.net/neweastsun/article/details/112758060 copy from/to使用示例
https://pgloader.readthedocs.io/en/latest/index.html 支持从mysql/pg/txt/dbf导入postgresql,底层也是COPY实现
总结
源 | 目标库 | 推荐 |
csv txt | lightdb-x | lt_bulkload/ltldr/copy |
lt_dump大量小文件恢复 | lightdb-x | lt_restore根据exclude_schema并行恢复 |
oracle | lightdb-x |
大量小表:ora2pg 小量大表:sqluldr2导出csv,ltldr/lt_bulkload/copy导入 |
mysql/mariadb | lightdb-x |
大量小表:ltloader 小量大表:select into outfile导出csv,ltldr/lt_bulkload/copy导入 |
postgresql/lightdb-x | lightdb-x | lt_dump/lt_restore |
postgresql/lightdb-x | lightdb-a |
大量小表:postgresql_fdw/ltloader 小量大表:ltuldr导出csv,ltldr/lt_bulkload/copy导入 要注意修改分片键确保性能佳 |
oracle | lightdb-a |
大量小表:ora2pg 小量大表:sqluldr2导出csv,ltldr/lt_bulkload/copy/gpfdist导入 要注意修改分片键确保性能佳 |
oracle | lightdb-x分布式 |
大量小表:ora2pg 小量大表:小量大表:sqluldr2导出csv,ltldr/lt_bulkload/gpfdist导入 注意时候调用命令create_distributed_table指定分片键确保性能佳 |
mysql/mariadb | lightdb-x分布式 |
大量小表:ltloader 小量大表:select into outfile导出csv,ltldr/lt_bulkload导入
|
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)
2021-09-18 postgresql/lightdb查询优化之GIN(Generalized Inverted Index)索引与全文检索
2019-09-18 mybatis三个执行器的差别