hive 问题Client端内存溢出
问题:
hive> create table goods_sail_info row format delimited fields terminated by ',' as select * from iphone_sail_info a where a.operate_system <> ''; Query ID = hadoop_20220301104405_374c43ea-6b91-41a2-8fd7-a06caec6c6b5 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator 2022-03-01 10:44:06,454 INFO [dd4d4767-e868-4205-8e44-6c1a8c85112a main] client.RMProxy: Connecting to ResourceManager at node01/192.168.51.100:8032 2022-03-01 10:44:06,472 INFO [dd4d4767-e868-4205-8e44-6c1a8c85112a main] client.RMProxy: Connecting to ResourceManager at node01/192.168.51.100:8032 Starting Job = job_1646098006070_0003, Tracking URL = http://node01:8088/proxy/application_1646098006070_0003/ Kill Command = /kkb/install/hadoop-3.1.4/bin/mapred job -kill job_1646098006070_0003 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2022-03-01 10:44:29,516 Stage-1 map = 0%, reduce = 0% Ended Job = job_1646098006070_0003 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec
报错原因:
该语句会进行全表全分区扫描,如果该表的分区数很多,数据量很大,可能就会出现客户端内存不足的报错。
注:客户端报内存溢出的判断依据,通过查看客户端输出来的日志中,还没有打印出作业的application id 信息(信息样式如下)就报内存溢出的异常了,在ResourceManager上也查看不到该作业的任何信息。
由于是客户端,在启动hive的时候就要指定好参数,启动之后修改不了,因此需要在启动hive命令之前,先修改环境变量
参考:https://segmentfault.com/a/1190000037604212
https://www.cnblogs.com/jiangxiaoxian/p/6377471.html