【presto】presto查询报错整理
问题一:
报错内容:
Error running query: Corrupted statistics for column "[firstcontactsname] optional binary " in Parquet file
"hdfs://nameservice1/user/hive/warehouse/riskn.db/ads_baihang_order_001_01/etl_dt=2021-09-12/000014_0":
[min: 0xE4B881E4B89AE697BA, max: 0x3138373631333135363538, num_nulls: 182605]
报错原因:
原因是存储的parquet文件有部分是损坏的,而presto在0.216开始添加了一个文件完整性检测,这个默认是开启的,可以直接关掉。
解决方案:
- 修改
hive
的catalog
配置文件 - 修改
session
级别配置
#全局关闭 :vim catalog/hive.properties
hive.parquet.fail-on-corrupted-statistics=false
#在会话中关闭,在sql前添加配置,推荐此种方式
set session hive.parquet_fail_with_corrupted_statistics=false;
问题二:
问题描述
presto 查询分桶表报错, 参考:https://blog.csdn.net/Mrerlou/article/details/119614586
问题三:
sql 内容
select
a.mid,
a.loan_date,
a.channel,
count(a.loan_date),
sum(b.loan_amount)
from inods.t04_ferryorderindia_orders a
join inods.t04_ferrycoreindia_loan b
on cast(a.order_no as varchar) = cast(b.order_no as varchar)
where a.loan_date > '2021-12-06'
and a.loan_date < cast('2021-12-12' as date)
group by a.mid,a.loan_date,a.channel
报错内容
Error running query: line 10:19: '>' cannot be applied to date, varchar(10)
问题原因
Presto
不支持自动类型转换。
SQL中loan_date
和 '2021-12-06'
分别是date类型和varchar类型。
解决方案
比较符号左右字段的类型需要设置为一致。
如:
cast(a.loan_date as varchar) > '2021-12-06'
或者
a.loan_date < cast('2021-12-12' as date)
最终sql
select
a.mid,
a.loan_date,
a.channel,
count(a.loan_date),
sum(b.loan_amount)
from inods.t04_ferryorderindia_orders a
join inods.t04_ferrycoreindia_loan b
on cast(a.order_no as varchar) = cast(b.order_no as varchar)
where a.loan_date > cast('2021-12-06' as date)
and a.loan_date < cast('2021-12-12' as date)
group by a.mid,a.loan_date,a.channel
问题四
问题描述
删除HIVE的部分分区后,PRESTO查询数据失败,但是HIVE查询却有数据
报错内容
查询使用presto查询hive表,报错内容如下:
Error running query: Partition location does not exist: hdfs://nameservice1/user/hive/warehouse/rsc.db/bd_banker_coupon_log/etl_dt=2021-12-31
报错原因
对hive数据分区进行了物理删除,但是hive查询时可以屏蔽该错误,但是Presto查询的时候不能屏蔽该错误,而是将该错误抛出。
解决方案
手动删除hive的元数据中的分区信息,如下所示:
alter table rsc.bd_banker_coupon_log drop partition(etl_dt=2021-12-30);
或者是指定分区来查询
select * from rsc.bd_banker_coupon_log where etl_dt='20220112' limit 1
问题五
报错内容
com.facebook.presto.operator.PageTooLargeException: Remote page is too large
at com.facebook.presto.operator.HttpRpcShuffleClient.rewriteException(HttpRpcShuffleClient.java:130)
at com.facebook.presto.operator.PageBufferClient$1.onFailure(PageBufferClient.java:361)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1052)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.facebook.airlift.http.client.ResponseTooLargeException: Maximum response size exceeded
at com.facebook.airlift.http.client.jetty.BufferingResponseListener.onHeaders(BufferingResponseListener.java:55)
at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:98)
at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:90)
at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:268)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:256)
at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1218)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1502)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:172)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:135)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73)
at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133)
at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:155)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
... 1 more
报错重点
Remote page is too large
报错原因
可能是有个很大的row,或者很大的数据/JSON 导致的。body.length+content.length
长度不能超过102400
参考:https://github.com/prestodb/presto/issues/5350
解决方案
修改/etc/config.properties
中exchange.http-client.max-content-length
参数由默认的 32M 改为 128MB
exchange.http-client.max-content-length=128MB
闲话
在0.67版本中看上去这个bug 已经很大程度避免了,但是不知道为什么新版本(0.269)又有了。
参考: https://prestodb.io/docs/current/release/release-0.76.html