datax报错:bucketId out of range: -1 (state=,code=0)
一、背景
datax同步postgre库表数据到hive表,同步完成后select报错
java.io.IOException: java.lang.IllegalArgumentException: Bucket ID out of range: -1
二、代码
1.hive 建表语句
create table if not exists hive_table_name( col1 string, col2 srting) row format delimited fields terminated by '\t' stored as orc;
2.datax自定义json
{ "job": { "content": [ { "reader": { "name": "postgresqlreader", "parameter": { "connection": [ { "jdbcUrl": ["jdbc:postgresql://ip:port/db"], "querySql": ["select col1,col2 from pg_table_name"], } ], "username": "user", "password": "pwd" } }, "writer": { "name": "hdfswriter", "parameter": { "defaultFS": "hdfs://ip:port", "fileType": "orc", "path": "/user/hive/warehouse/db_name.db/hive_table_name", "fileName": "hive_table_name", "column": [ {"name":"col1","type":"string"}, {"name":"col2","type":"string"} ], "writeMode": "append", "fieldDelimiter": "\t", "encoding": "utf-8" } } }], "setting": { "speed": { "channel": "1" }, "errorLimit": { "record": 0, "percentage": 0.02 } } } }
3.查询语句
select * from hive_table_name;
三、定位问题
hive3.1默认是桶表,但是我没有分桶,桶数量就是-1,这里就会报错。
定位原因为hdfs文件在hive中不可访问,没有加载到hive表中。
hadoop3.1建表默认创建acid表,acid表只支持ORC格式
四、解决方案
1.load data
手动将hdfs文件映射到hive表中
load data inpath '/user/hive/warehouse/db_name.db/hive_table_name' overwrite into table hive_table_name;
2.设置参数
hive.fetch.task.conversion
这个属性有3个可选的值:
none :关闭fetch task优化
minimal:只在select *、使用分区列过滤、带有limit的语句上优化
more:在minimal的基础上,select 不仅可以是*,还可以单独选择几列,并且fitler也不再局限于分区字段,同时支持虚拟列(别名)
set hive.fetch.task.conversion=none;
3.文件格式
建表语句建成textfile格式(一劳永逸)
create table if not exists hive_table_name( col1 string, col2 srting) row format delimited fields terminated by '\t' stored as textfile;
原文链接:https://blog.csdn.net/chimchim66/article/details/127306543