深入学习HIVE知识,
参考资料:http://www.aboutyun.com/thread-7444-1-1.html
我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
方法一
hive> set hive.fetch.task.conversion=more; hive> SELECT id, money FROM m limit 10; OK 1 122 1 185 1 231 1 292 1 316 1 329 1 355 1 356 1 362 1 364 Time taken: 0.138 seconds, Fetched: 10 row(s)
set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job
方法二
bin/hive --hiveconf hive.fetch.task.conversion=more
方法三
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:
<property> <name>hive.fetch.task.conversion</name> <value>more</value> <description> Some select queries can be converted to single FETCH task minimizing latency.Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incurrs RS), lateral views and joins. 1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only 2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns) </description> </property>