Hive之简单查询不启用MapReduce

假设你想查询某个表的某一列。Hive默认是会启用MapReduce Job来完毕这个任务，例如以下：

01

hive>
 SELECT id, money FROM m limit 10;

02

Total
 MapReduce jobs = 1

03

Launching
 Job 1 out
 of 1

04

Number
 of reduce tasks is set to 0 since
 there's no reduce operator

05

Cannot
 run job locally: Input Size (= 235105473)
 is larger than

06

hive.exec.mode.local.auto.inputbytes.max
 (= 134217728)

07

Starting
 Job = job_1384246387966_0229, Tracking URL =

08

09

http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/

10

11

Kill
 Command = /home/q/hadoop-2.2.0/bin/hadoop
 job 

12

-kill
 job_1384246387966_0229

13

hadoop
 job information forStage-1:
 number of mappers: 1;

14

number
 of reducers: 0

15

2013-11-13 11:35:16,167 Stage-1 map
 = 0%, 
 reduce = 0%

16

2013-11-13 11:35:21,327 Stage-1 map
 = 100%, 
 reduce = 0%,

17

 Cumulative
 CPU 1.26 sec

18

2013-11-13 11:35:22,377 Stage-1 map
 = 100%, 
 reduce = 0%,

19

 Cumulative
 CPU 1.26 sec

20

MapReduce
 Total cumulative CPU time: 1 seconds 260 msec

21

Ended
 Job = job_1384246387966_0229

22

MapReduce
 Jobs Launched:

23

Job 0:
 Map: 1   Cumulative
 CPU: 1.26sec  

24

HDFS
 Read: 8388865 HDFS
 Write: 60 SUCCESS

25

Total
 MapReduce CPU Time Spent: 1 seconds 260 msec

26

OK

27

1       122

28

1       185

29

1       231

30

1       292

31

1       316

32

1       329

33

1       355

34

1       356

35

1       362

36

1       364

37

Time
 taken: 16.802 seconds,
 Fetched: 10 row(s)

　　我们都知道，启用MapReduce Job是会消耗系统开销的。对于这个问题。从Hive0.10.0版本号開始，对于简单的不须要聚合的类似SELECT <col> from <table> LIMIT n语句，不须要起MapReduce job，直接通过Fetch task获取数据，能够通过以下几种方法实现：
　　方法一：

01

hive>
 set hive.fetch.task.conversion=more;

02

hive>
 SELECT id, money FROM m limit 10;

03

OK

04

1       122

05

1       185

06

1       231

07

1       292

08

1       316

09

1       329

10

1       355

11

1       356

12

1       362

13

1       364

14

Time
 taken: 0.138 seconds,
 Fetched: 10 row(s)

上面 set hive.fetch.task.conversion=more;开启了Fetch任务，所以对于上述简单的列查询不在启用MapReduce job！

　　方法二：

1

bin/hive
 --hiveconf hive.fetch.task.conversion=more

　　方法三：
上面的两种方法都能够开启了Fetch任务，可是都是暂时起作用的；假设你想一直启用这个功能。能够在${HIVE_HOME}/conf/hive-site.xml里面增加下面配置：

01

<property>

02

  <name>hive.fetch.task.conversion</name>

03

  <value>more</value>

04

  <description>

05

    Some
 select queries can be converted to single FETCH task

06

    minimizing
 latency.Currently the query should be single

07

    sourced
 not having any subquery and should not have

08

    any
 aggregations or distincts (which incurrs RS),

09

    lateral
 views and joins.

10

    1.
 minimal : SELECT STAR, FILTER on partition columns, LIMIT only

11

    2.
 more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)

12

  </description>

13

</property>

这样就能够长期启用Fetch任务了，非常不错吧。也赶紧去试试吧！

posted on 2017-04-23 08:56 mthoutai 阅读(673) 评论(0) 收藏举报

刷新页面返回顶部