使用CapacityTaskScheduler
Hadoop的版本为0.19.2
关于这个调度的详细介绍,可以参考:http://hadoop.apache.org/common/docs/r0.19.2/capacity_scheduler.html
本文只介绍如何搭建一个CapacityTaskScheduler的系统。
在Master机器上执行如下操作:
1 将contrib/capacity-scheduler/hadoop-0.19.2-capacity-scheduler.jar文件拷贝到lib包下面(注意,如果有FairScheduler存在,请先删除这个包)。
2 添加如下内容到hadoop-site.xml文件中:
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>
<property>
<name>mapred.queue.names</name>
<value>logQueue1,logQueue2,algQueue1,algQueue2,default</value>
</property>
3 在capacity-scheduler.xml文件中填写如下内容:
<?xml version="1.0"?>
<configuration>
<property>
<name>mapred.capacity-scheduler.queue.logQueue1.guaranteed-capacity</name>
<value>20</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.logQueue1.reclaim-time-limit</name>
<value>5</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.logQueue1.supports-priority</name>
<value>true</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.logQueue2.guaranteed-capacity</name>
<value>20</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.logQueue2.reclaim-time-limit</name>
<value>5</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.logQueue2.supports-priority</name>
<value>true</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.algQueue1.guaranteed-capacity</name>
<value>20</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.algQueue1.reclaim-time-limit</name>
<value>5</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.algQueue1.supports-priority</name>
<value>true</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.algQueue2.guaranteed-capacity</name>
<value>20</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.algQueue2.reclaim-time-limit</name>
<value>5</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.algQueue2.supports-priority</name>
<value>true</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.guaranteed-capacity</name>
<value>20</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.reclaim-time-limit</name>
<value>5</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.supports-priority</name>
<value>true</value>
</property>
</configuration>
4 重启Hadoop
5 在Job的代码中,设置Job属于的队列:
conf.setQueueName(“QueueName”);
经过以上五步操作以后,我们的调度就配置起来了。
通过JobTracker的web界面看到如下的情况:
Scheduling Information
Queue Name
Scheduling Information
algQueue1
Guaranteed Capacity (%) : 20.0
Guaranteed Capacity Maps : 7
Guaranteed Capacity Reduces : 7
User Limit : 100
Reclaim Time limit : 5
Number of Running Maps : 0
Number of Running Reduces : 0
Number of Waiting Maps : 0
Number of Waiting Reduces : 0
Priority Supported : YES
algQueue2
Guaranteed Capacity (%) : 20.0
Guaranteed Capacity Maps : 7
Guaranteed Capacity Reduces : 7
User Limit : 100
Reclaim Time limit : 5
Number of Running Maps : 0
Number of Running Reduces : 0
Number of Waiting Maps : 0
Number of Waiting Reduces : 0
Priority Supported : YES
default
Guaranteed Capacity (%) : 20.0
Guaranteed Capacity Maps : 7
Guaranteed Capacity Reduces : 7
User Limit : 100
Reclaim Time limit : 0
Number of Running Maps : 0
Number of Running Reduces : 0
Number of Waiting Maps : 0
Number of Waiting Reduces : 0
Priority Supported : YES
logQueue1
Guaranteed Capacity (%) : 20.0
Guaranteed Capacity Maps : 7
Guaranteed Capacity Reduces : 7
User Limit : 100
Reclaim Time limit : 5
Number of Running Maps : 0
Number of Running Reduces : 0
Number of Waiting Maps : 0
Number of Waiting Reduces : 0
Priority Supported : YES
logQueue2
Guaranteed Capacity (%) : 20.0
Guaranteed Capacity Maps : 7
Guaranteed Capacity Reduces : 7
User Limit : 100
Reclaim Time limit : 5
Number of Running Maps : 0
Number of Running Reduces : 0
Number of Waiting Maps : 0
Number of Waiting Reduces : 0
Priority Supported : YES