【presto】presto查询报错整理

问题一:

报错内容:

在这里插入图片描述

Error running query: Corrupted statistics for column "[firstcontactsname] optional binary " in Parquet file 
"hdfs://nameservice1/user/hive/warehouse/riskn.db/ads_baihang_order_001_01/etl_dt=2021-09-12/000014_0": 
[min: 0xE4B881E4B89AE697BA, max: 0x3138373631333135363538, num_nulls: 182605]

报错原因:

原因是存储的parquet文件有部分是损坏的,而presto在0.216开始添加了一个文件完整性检测,这个默认是开启的,可以直接关掉。

解决方案:

  1. 修改hivecatalog配置文件
  2. 修改session级别配置
#全局关闭 :vim  catalog/hive.properties
hive.parquet.fail-on-corrupted-statistics=false
 
#在会话中关闭,在sql前添加配置,推荐此种方式 
set session hive.parquet_fail_with_corrupted_statistics=false;

问题二:

问题描述

presto 查询分桶表报错, 参考:https://blog.csdn.net/Mrerlou/article/details/119614586


问题三:

sql 内容

select 
a.mid, 
a.loan_date, 
a.channel,
count(a.loan_date), 
sum(b.loan_amount) 
from inods.t04_ferryorderindia_orders a 
join inods.t04_ferrycoreindia_loan b 
on cast(a.order_no as varchar) = cast(b.order_no as varchar) 
where a.loan_date > '2021-12-06'
and a.loan_date < cast('2021-12-12' as date)
group by a.mid,a.loan_date,a.channel

报错内容

Error running query: line 10:19: '>' cannot be applied to date, varchar(10)

在这里插入图片描述

问题原因

Presto不支持自动类型转换。

SQL中loan_date'2021-12-06' 分别是date类型和varchar类型。

解决方案

比较符号左右字段的类型需要设置为一致。

如:

cast(a.loan_date as varchar) > '2021-12-06'

或者

a.loan_date < cast('2021-12-12' as date)

最终sql

select 
a.mid, 
a.loan_date, 
a.channel,
count(a.loan_date), 
sum(b.loan_amount) 
from inods.t04_ferryorderindia_orders a 
join inods.t04_ferrycoreindia_loan b 
on cast(a.order_no as varchar) = cast(b.order_no as varchar) 
where a.loan_date  > cast('2021-12-06' as date)
and a.loan_date < cast('2021-12-12' as date)
group by a.mid,a.loan_date,a.channel


问题四

问题描述

删除HIVE的部分分区后,PRESTO查询数据失败,但是HIVE查询却有数据

报错内容

查询使用presto查询hive表,报错内容如下:

Error running query: Partition location does not exist: hdfs://nameservice1/user/hive/warehouse/rsc.db/bd_banker_coupon_log/etl_dt=2021-12-31

在这里插入图片描述

报错原因

对hive数据分区进行了物理删除,但是hive查询时可以屏蔽该错误,但是Presto查询的时候不能屏蔽该错误,而是将该错误抛出。

解决方案

手动删除hive的元数据中的分区信息,如下所示:

alter table rsc.bd_banker_coupon_log drop partition(etl_dt=2021-12-30);

或者是指定分区来查询

select * from rsc.bd_banker_coupon_log where etl_dt='20220112' limit 1

问题五

报错内容

com.facebook.presto.operator.PageTooLargeException: Remote page is too large
	at com.facebook.presto.operator.HttpRpcShuffleClient.rewriteException(HttpRpcShuffleClient.java:130)
	at com.facebook.presto.operator.PageBufferClient$1.onFailure(PageBufferClient.java:361)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1052)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.facebook.airlift.http.client.ResponseTooLargeException: Maximum response size exceeded
	at com.facebook.airlift.http.client.jetty.BufferingResponseListener.onHeaders(BufferingResponseListener.java:55)
	at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:98)
	at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:90)
	at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:268)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:256)
	at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1218)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1502)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:172)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:135)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73)
	at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133)
	at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:155)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
	... 1 more

报错重点

Remote page is too large

报错原因

可能是有个很大的row,或者很大的数据/JSON 导致的。body.length+content.length长度不能超过102400

参考:https://github.com/prestodb/presto/issues/5350

解决方案

修改/etc/config.propertiesexchange.http-client.max-content-length 参数由默认的 32M 改为 128MB

exchange.http-client.max-content-length=128MB

闲话

在0.67版本中看上去这个bug 已经很大程度避免了,但是不知道为什么新版本(0.269)又有了。

在这里插入图片描述

参考: https://prestodb.io/docs/current/release/release-0.76.html

posted @ 2022-11-10 19:25  彬在俊  阅读(347)  评论(0编辑  收藏  举报