oozie笔记

oozie

1.下载
https://mirrors.tuna.tsinghua.edu.cn/apache/
oozie-4.3.1.tar.gz
注意：一开始下载的5.1.0版本，安装完成后，web服务后台报错，应该是版本的问题。
hadoop用的apache hadoop2.7.7

2.解压
tar -zxf oozie-4.3.1.tar.gz

3.安装maven
https://maven.apache.org/download.cgi?Preferred=http%3A%2F%2Fmirrors.tuna.tsinghua.edu.cn%2Fapache%2F
apache-maven-3.6.3-bin.tar.gz
设置环境变量

4.编译ooize
cd oozie-4.3.1
mvn clean test -X
./bin/mkdistro.sh -DskipTests -Puber -Dhadoop.version=2.7.7 -Dhadoop.auth.version=2.7.7 -X
编译时间比较长，编译完后在oozie-4.3.1/distro/target/目录中生成
oozie-4.3.1-distro.tar.gz

5.配置hadoop
===>core-site.xml添加
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
===>mapred-site.xml添加
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop02:19888</value>
    </property>
===>yarn-site.xml添加
   <property> 
        <name>yarn.log.server.url</name> 
        <value>http://hadoop02:19888/jobhistory/logs/</value> 
    </property>
将上面的配置scp到每台节点上

6.不用重启hdfs和yarn更新配置文件的命令
hdfs dfsadmin -refreshSuperUserGroupsConfiguration
yarn rmadmin -refreshSuperUserGroupsConfiguration
启动jobhistory
sbin/mr-jobhistory-daemon.sh start historyserver

7.拷贝oozie-4.3.1-distro.tar.gz
tar -zxf oozie-4.3.1-distro.tar.gz -C /home/hadoop
cd  oozie-4.3.1
mkdir libext
cd libext
find /home/hadoop/hadoop-2.7.7/share/hadoop/ -name '*.jar' -exec cp {} . \;
rm -rf jsp-api-2.1.jar
下载http://archive.cloudera.com/gplextras/misc/ext-2.2.zip

8.配置oozie的mysql数据库信息
===>oozie-4.3.1/conf/oozie-site.xml
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>a</description>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://192.168.15.45:3307/oozie</value>
        <description>b</description>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>root</value>
        <description>c</description>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>root</value>
        <description>d</description>
    </property>
    <property>
        <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
        <value>*=/home/hadoop/hadoop-2.7.7/etc/hadoop/</value>
        <description>e</description>
    </property>


9.数据库初始化
现在mysql中创建好数据库oozie
在oozie-4.3.1目录下执行
bin/ooziedb.sh create -sqlfile oozie.sql -run
当前目录下生成一个oozie.sql文件

10.初始化oozie
上次oozie-4.3.1目录下面的oozie-sharelib-4.3.1.tar.gz到hdfs
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop01:9000 -locallib oozie-sharelib-4.3.1.tar.gz 

11.打包项目，生成war包（报错没有安装zip）
bin/oozie-setup.sh prepare-war

12.启动命令
bin/oozied.sh run 或者 bin/oozied.sh start
停止命令 bin/oozied.sh stop

13.验证命令
bin/oozie admin -oozie http://localhost:11000/oozie -status

14.目前3个容器模拟集群的进程情况
hadoop01(172.17.0.3)
=================
Dzookeeper
Dproc_journalnode
Dproc_zkfc
Dproc_namenode
Dproc_resourcemanager
oozie
Dproc_historyserver
=================
hadoop02(172.17.0.5)
=================
Dzookeeper
Dproc_journalnode
Dproc_zkfc
Dproc_datanode
Dproc_namenode
Dproc_nodemanager    <----oozie的job由此执行
=================
hadoop03(172.17.0.7)
=================
Dzookeeper
Dproc_journalnode
Dproc_datanode
Dproc_nodemanager   <----oozie的job由此执行

15.案例一：调用shell命令创建目录
在oozie-4.3.1目录下执行
mkdir -p oozie-apps/shell
vi oozie-apps/shell/job.properties
=======================================
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop01:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
=======================================
vi oozie-apps/shell/workflow.xml
=======================================
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
    <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <exec>mkdir</exec>
        <argument>/home/hadoop/xxxxxxyyyyyy</argument>
        <capture-output/>
    </shell>
    <ok to="end"/>
    <error to="fail"/>
</action>
<kill name="fail">
    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
=======================================
上传oozie-apps目录到hdfs
hadoop fs -put oozie-apps/ /user/hadoop
执行任务
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/shell/job.properties -run

15.案例二：调度执行多个job
在oozie-4.3.1目录下执行
mkdir oozie-apps/manyjob
vi oozie-apps/manyjob/job.properties
=======================================
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop01:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/manyjob
=======================================
vi oozie-apps/manyjob/workflow.xml
=======================================
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
    <start to="p1-shell-node"/>
    <action name="p1-shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>mkdir</exec>
            <argument>/home/hadoop/xxxxxyyyyyzzzzz</argument>
            <capture-output/>
        </shell>
        <ok to="p2-shell-node"/>
        <error to="fail"/>
    </action>

    <action name="p2-shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>mkdir</exec>
            <argument>/home/hadoop/aaaaabbbbbccccc</argument>
            <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>
=======================================
上传manyjob目录到hdfs
hadoop fs -put oozie-apps/manyjob /user/hadoop/oozie-apps
执行任务
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/manyjob/job.properties -run

16.案例三：Oozie调度MapReduce任务
在oozie-4.3.1目录下执行
mkdir oozie-apps/map-reduce
创建存放jar的目录
mkdir oozie-apps/map-reduce/lib
拷贝jar，以hadoop的wordcount为例
cp /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar oozie-apps/map-reduce/lib
在hdfs上创建/input/文件夹，准备someworld.txt文本
hadoop fs -mkdir /input
hadoop fs -put someword.txt /input/someword.txt

vi oozie-apps/map-reduce/job.properties
=======================================
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop03:8032
queueName=default
examplesRoot=oozie-apps
#hdfs://hadoop02:8020/user/admin/oozie-apps/map-reduce/workflow.xml
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
outputDir=map-reduce
=======================================
vi oozie-apps/map-reduce/workflow.xml
=======================================
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
    <start to="mr-node"/>
    <action name="mr-node">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/output/"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>

                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapreduce.job.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapreduce.job.output.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>/input/</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>/output/</value>
                </property>
                <property>
                    <name>mapreduce.job.map.class</name>
                    <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
                </property>
                <property>
                    <name>mapreduce.job.reduce.class</name>
                    <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
                </property>
                <property>
                    <name>mapred.map.tasks</name>
                    <value>1</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>
=======================================
上传map-reduce目录到hdfs
hadoop fs -put oozie-apps/map-reduce /user/hadoop/map-reduce
执行任务
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/map-reduce/job.properties -run


17.案例四：Oozie定时任务/循环任务
在oozie-4.3.1目录下执行
mkdir oozie-apps/cron
当前创建hello.sh脚本
=======================================
#!/bin/bash
echo "hello " >> /home/hadoop/yyyyy.txt
=======================================
vi oozie-apps/cron/job.properties
=======================================
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop03:8032
queueName=default
examplesRoot=oozie-apps

oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/cron
start=2019-12-11T11:10+0800
end=2019-12-12T10:30+0800
workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/cron
EXEC=hello.sh
=======================================

vi oozie-apps/cron/coordinator.xml 
=======================================
<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.2">
<action>
        <workflow>
            <app-path>${workflowAppUri}</app-path>
            <configuration>
                <property>
                    <name>jobTracker</name>
                    <value>${jobTracker}</value>
                </property>
                <property>
                    <name>nameNode</name>
                    <value>${nameNode}</value>
                </property>
                <property>
                    <name>queueName</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
        </workflow>
</action>
</coordinator-app>

=======================================

vi oozie-apps/cron/workflow.xml
=======================================
<workflow-app xmlns="uri:oozie:workflow:0.5" name="one-op-wf">
<start to="shell-node"/>
  <action name="shell-node">
      <shell xmlns="uri:oozie:shell-action:0.2">
          <job-tracker>${jobTracker}</job-tracker>
          <name-node>${nameNode}</name-node>
          <configuration>
              <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
              </property>
          </configuration>
          <exec>${EXEC}</exec>
          <file>/user/hadoop/oozie-apps/cron/${EXEC}#${EXEC}</file>
          <capture-output/>
      </shell>
      <ok to="end"/>
      <error to="fail"/>
  </action>
<kill name="fail">
    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
=======================================

上传cron目录到hdfs
hadoop fs -put oozie-apps/cron /user/hadoop/cron
执行任务
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/cron/job.properties -run


18.配置OOZIE_URL
$ export OOZIE_URL="http://localhost:11000/oozie"
$ oozie job -info 14-20090525161321-oozie-tucu

19.javaAPI
配置jar包
=======================================
<dependency>
<groupId>org.apache.oozie</groupId>
<artifactId>oozie-client</artifactId>
<version>4.3.1</version>
</dependency>
=======================================
OozieDemo.java
=======================================
package com.kizzle.oozie;

import java.util.Properties;

import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.OozieClientException;
import org.apache.oozie.client.WorkflowJob;

public class OozieDemo {

public static void main(String[] args) throws OozieClientException, InterruptedException {
OozieClient wc = new OozieClient("http://hadoop01:11000/oozie");

Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, "hdfs://hadoop01:9000/user/hadoop/oozie-apps/map-reduce2");
conf.setProperty("outputDir", "/output33");
conf.setProperty("inputDir", "/input");
conf.setProperty("user.name", "hadoop");
conf.setProperty("jobTracker", "hadoop03:8032");
conf.setProperty("mapreduce.job.user.name", "hadoop");
conf.setProperty("nameNode", "hdfs://hadoop01:9000");
conf.setProperty("queueName", "default");

String jobId = wc.run(conf);
System.out.println("Workflow job submitted");

while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {
System.out.println("Workflow job running ...");
Thread.sleep(10 * 1000);
}
System.out.println("Workflow job completed ...");
System.out.println(wc.getJobInfo(jobId));

}

}
=======================================
javaAPI代替了job.properties，把配置信息放到了Properties对象中。
workflow.xml通过el表达式获取配置信息。

posted @ 2019-12-10 17:17 hot小热阅读(304) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

hot小热

oozie笔记

公告