Oozie原理 以及 Action执行模型简单分析

一,Oozie 内部结构简单分析(Oozie Internals)

 Oozie是Hadoop的工作流管理系统,正如论文《Oozie: towards a scalable workflow management system for Hadoop》所说:工作流提供了一种声明式的框架来有效地管理各种各样的作业,有四个大的需求:可扩展性、多租户、Hadoop 安全性、可操作性。

Oozie的架构图如下:

 Oozie提供了RESTful API接口来接受用户的提交请求(提交工作流作业)。其实,在命令行使用oozie -job xxx命令提交作业,本质上也是发HTTP请求向OozieServer提交作业。

After the workflow submission to Oozie, workflow engine layer drives the execution and associated transitions. 
The workflow engine accomplishes these through a set of pre-defined internal sub-tasks called Commands.

当提交了workflow后,由工作流引擎负责workflow的执行以及状态的转换。比如,从一个Action执行到下一个Action,或者workflow状态由Suspend变成KILLED

 

Most  of  the  commands  are  stored  in  an  internal  priority  queue from where a pool of worker threads picks up
and executes those commands. There are two types of commands:
some are executed when the user submits the request and others are executed asynchronously.

这里有两种类型的Commands,一种是同步执行的,另一种是异步执行的。

用户在HDFS上部署好作业(MR作业),然后向Oozie提交Workflow,Oozie以异步方式将作业(MR作业)提交给Hadoop。这也是为什么当调用Oozie 的RESTful接口提交作业之后能立即返回一个jobId的原因,用户程序不必等待作业执行完成(因为有些大作业可能会执行很久(几个小时甚至几天))。Oozie在后台以异步方式,再将workflow对应的Action提交给hadoop执行。

Oozie  splits  larger  workflow  management tasks  (not  Hadoop  jobs)  into  smaller  manageable  subtasks
and asynchronously processes them using a pre-defined state transition model.

 

此外,Oozie提供了一个access layer访问底层的集群资源。这是Hadoop Security的一个方面吧。

Oozie  provides  a  generic  Hadoop  access  layer restricted through Kerberos authentication to 
access Hadoop’s Job Tracker and Name Node components.

 

Oozie的水平可扩展性和垂直可扩展性

水平可扩展性体现在以下几个方面:

①具体的作业执行上下文不是在Oozie Server process中。这个在Oozie的Action执行模型中会提到。也就是说:Oozie Server只负责执行workflow,而workflow中的Action,比如MapReduce Action或者Java Action的执行是以集群的方式执行的。Oozie Server只负责查询这些Action的执行状态和结果,从而降低了Oozie Server的负载。

Oozie needs to execute different types of jobs as part of workflow processing. If the jobs are executed in the context of the server process, 
there will be twoissues: 1) fewer jobs could run simultaneously due to limited resources in a server process causing significant penalty in scalability and 2) the user application could directly impact the Oozie server performance.

通过将实际作业(MR Action or JAVA Action)的运行交给Hadoop来管理并执行,Oozie Server只负责查询作业的状态...如果用户提交的workflow增多了,只需要简单地增加Oozie Server 即可。

②作业的状态持久化到关系数据库中(以后考虑使用Zookeeper),由于作业(比如MR Action状态)状态存储在数据库中,而不是在单机的内存中,故很扩容。此外,上面还提到了,实际作业的具体执行是由Hadoop执行的。

 Oozie stores the job states into a persistent store. This approach  enables 
multiple Oozie servers to run simultaneously from different machines.

 

垂直可扩展性体现在:

①线程池以及队列中的Commands的正确配置与使用。

②异步作业提交模型--减少线程的阻塞

Oozie  often uses a pre-defined timeout for any external communication. Oozie follows an asynchronous job execution pattern for interaction with 
external systems. For example, when a job is submitted to the Hadoop Job Tracker, Oozie does not wait for the job to finish
since it may take a long time. Instead Oozie quickly returns the worker thread back to the thread pool and
later checks for job completion in a separate interaction using a different thread.

 

③使用内存锁的事务模型 而不是 persistent model?--有点不懂

 In order to maximize resource usage, the persistent store connections are held for the shortest possible duration. To this end,
we chose a memory lock based transaction model instead of a persistent store based one; the latter is often more expensive to hold for long time.

 

最后看看Oozie是怎么从Hadoop集群中获取作业的执行结果的?---回调 和 轮询 并用

回调是为了降低开销,轮询是为了保证可靠性。

When Oozie starts a MapReduce job, it provides a unique callback URL as part of the MapReduce  job  configuration;  the Hadoop  Job  Tracker 
invokes the given URL to notify the completion of the job. For cases where the Job Tracker failed to invoke the callback URL for any reason
(i.e. a transient network failure), the system has a mechanism to poll the Job Tracker for determining the completion of the MapReduce job.

 

二,Oozie的Action执行模型(Action Execution Model)

A fundamental design principle in Oozie is that the Oozie server never runs user code other than the execution of the workflow itself. 
This ensures better service stability byisolating user code away from Oozie’s code. The Oozie server is also stateless and the launcher job
makes it possible for it to stay that way. By leveraging Hadoop for running the launcher,
handling job failures and recoverability becomes easier for the stateless Oozie server.

①Oozie never run user code other than the execution of the workflow itself. 比如,这里的usercode就是用户编写的MapReduce程序。

The Oozie server is also stateless and the launcher job.....

oozie server的无状态其实就是它把作业的执行信息持久化到数据库了。

Action的执行模型图如下:

 

Oozie runs the actual actions through a launcher job, which itself is a Hadoop 
Map‐Reduce job that runs on the Hadoop cluster. The launcher is a map-only 
job that runs only one mapper.

Oozie通过 launcher job 运行某个具体的Action。launcher job是一个 map-only的MR作业,而且并不知道它将在集群的哪台机器上执行这个MR作业。

在上图中,Oozie Client提交了一个workflow给Oozie Server。这个workflow里面要执行具体的Hive作业(Hive Action)

首先Oozie Server会启动一个MR作业,也就是launcher job,由launcher job来发起具体的Hive作业。(Hive作业本质上是MR作业)

而我们知道:launcher job是个MR作业,它需要占用slot,也就是说:每提交一个workflow作业,都会创建一个launcher job并占用一个slot,如果底层Hadoop集群slot个数很少,而Oozie提交的作业又很多,launcher job把 slot用完了,使得实际执行Action已经没有slot可用了,这就会导致死锁。当然,可以通过配置Oozie的相关参数来避免Oozie发起太多的launcher job

另外,对于MR Action(Hive Action),launcher job 并不需要等到它发起的Action执行完毕后才退出。事实上:MR Action的launcher并不会等待MR作业执行完毕后才退出。

The  <map-reduce>launcher is the exception and it exits right after launching the actual job instead of waiting for it to complete.

 

另外,正是由于这个“launcher job 机制”,当需要将作业交给Oozie来管理运行时,需要将作业相关的配置文件先在HDFS上部署好,然后向Oozie Server发RESTful请求提交作业。

 

posted @ 2016-07-26 10:59  大熊猫同学  阅读(4844)  评论(2编辑  收藏  举报