Hive ERROR: Out of memory due to hash maps used in map-side aggregation

当hive在执行大数据量的统计查询语句时，经常会出现下面OOM错误，具体错误提示如下：

Possible error: Out of memory due to hash maps used in map-side aggregation.

Solution: Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;'

查看task的失败信息为：

Error:GC overhead limit exceeded

对于这个错误，通常是由两种情况造成的：(1) hive sql写的不合理，导致执行时hash map过大；（2）hive sql没有优化的余地了（要得到想要的数据只能写这样的sql）。

对于（1）则改变sql语句，从而降低hash map的大小。对于（2）则可以调整参数。

下面分别说明（1）和（2）的情况：

（1）改变sql语句

select count(distinct v) from tbl;
可以改为select count(1) from (select v from tbl group by v) t;

说明：减少了hash map的key个数

select collect_set(messageDate)[0],count(*) from incidents_hive group by substr(messageDate,8,2);
可以改为select hourNum, count(1) from (select substr(messageDate,9,2) as hourNum from incidents_hive ) t group by hourNum;

说明：没有减少hash map的key个数，但是减少了value的大小

（2）调整参数

对于这个sql语句，是没办法进行优化(因为keywords的重复率很低，导致map阶段里面维护的一个内存Map对象非常巨大)来降低hash map大小的：

INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)=\"$yesterday\" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;

与mapjoin和map aggregate相关的优化参数有：

hive.map.aggr

hive.groupby.mapaggr.checkinterval

hive.map.aggr.hash.min.reduction

hive.map.aggr.hash.percentmemory

hive.groupby.skewindata

以上参数可以查看配置文件说明即文档进行调整。如果需求确实没法通过调整这些参数来达到，那么set hive.map.aggr=false便是最终的方案，它肯定能满足你需求，只是执行速度比map join 和 map aggr慢些，但通过实际跑数据你很可能发现其实它也不慢哈。

参考文章：

http://blog.csdn.net/macyang/article/details/9260777
http://www.myexception.cn/open-source/1487747.html
http://blog.csdn.net/lixucpf/article/details/20458617

INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)=\"$yesterday\" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;

posted @ 2014-05-09 18:52 JerryShao 阅读(205) 评论(0) 编辑收藏举报

刷新页面返回顶部

Hive ERROR: Out of memory due to hash maps used in map-side aggregation

公告