hive 问题Client端内存溢出

问题:

 

 

 

hive> create table goods_sail_info row format delimited fields terminated by ',' as select * from iphone_sail_info a where a.operate_system <> '';
Query ID = hadoop_20220301104405_374c43ea-6b91-41a2-8fd7-a06caec6c6b5
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-01 10:44:06,454 INFO  [dd4d4767-e868-4205-8e44-6c1a8c85112a main] client.RMProxy: Connecting to ResourceManager at node01/192.168.51.100:8032
2022-03-01 10:44:06,472 INFO  [dd4d4767-e868-4205-8e44-6c1a8c85112a main] client.RMProxy: Connecting to ResourceManager at node01/192.168.51.100:8032
Starting Job = job_1646098006070_0003, Tracking URL = http://node01:8088/proxy/application_1646098006070_0003/
Kill Command = /kkb/install/hadoop-3.1.4/bin/mapred job  -kill job_1646098006070_0003
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-01 10:44:29,516 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1646098006070_0003 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

报错原因:
该语句会进行全表全分区扫描,如果该表的分区数很多,数据量很大,可能就会出现客户端内存不足的报错。
注:客户端报内存溢出的判断依据,通过查看客户端输出来的日志中,还没有打印出作业的application id 信息(信息样式如下)就报内存溢出的异常了,在ResourceManager上也查看不到该作业的任何信息。

由于是客户端,在启动hive的时候就要指定好参数,启动之后修改不了,因此需要在启动hive命令之前,先修改环境变量

 

参考:https://segmentfault.com/a/1190000037604212

https://www.cnblogs.com/jiangxiaoxian/p/6377471.html

posted @ 2022-03-01 10:49  yasai  阅读(150)  评论(0编辑  收藏  举报