Apache Hive 执行HQL语句报错 ( 10G )
# 故障描述:
hive > select substring(request_body["uuid"], -1, 1) as uuid, count(distinct(request_body["uuid"])) as count from log_bftv_api where year=2017 and month=11 and day=1 and request_body["method"] = "bv.lau.urecommend" and length(request_body["uuid"]) = 25 group by 1 order by uuid; # hive 执行该HQL语句时报错信息如下:( 数据量小的时候没有问题 )
# 报错信息:
MapReduce Total cumulative CPU time: 1 minutes 46 seconds 70 msec Ended Job = job_1510050683827_0137 with errors Error during job, obtaining debugging information... Examining task ID: task_1510050683827_0137_m_000002 (and more) from job job_1510050683827_0137 Task with the most failures(4): ----- Task ID: task_1510050683827_0137_m_000000 URL: http://namenode:8088/taskdetails.jsp?jobid=job_1510050683827_0137&tipid=task_1510050683827_0137_m_000000 ----- Diagnostic Messages for this Task: Error: Java heap space FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 3 Reduce: 5 Cumulative CPU: 106.07 sec HDFS Read: 223719539 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 1 minutes 46 seconds 70 msec
# 原因分析:
报错显示 Error: Java heap space、return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 查资料说是因为内存的原因,由于HQL实际上是被转换成mapreduce的java任务,所以做了以下操作。
解决方法:
hadoop shell > vim etc/hadoop/hadoop-env.sh # 默认 1000 export HADOOP_HEAPSIZE=4096 hadoop shell > vim etc/hadoop/yarn-env.sh # 默认 1000 YARN_HEAPSIZE=4096 # 跟据实际情况,按需调整! hadoop shell > vim etc/hadoop/mapred-site.xml <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024M</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2560M</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>512</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>100</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>50</value> </property> # 新增这些参数 ( 跟据机器实际情况,按需成倍调整 )
# 我的这个测试环境是4台8核8G的KVM虚拟机,一个NameNode,三个DataNode!
# 经过这次参数调整,目前600G的数据集上没出过问题,HDFS 上还在不断的写入历史数据、新数据。