oozie笔记
oozie
1.下载 https://mirrors.tuna.tsinghua.edu.cn/apache/ oozie-4.3.1.tar.gz 注意:一开始下载的5.1.0版本,安装完成后,web服务后台报错,应该是版本的问题。 hadoop用的apache hadoop2.7.7 2.解压 tar -zxf oozie-4.3.1.tar.gz 3.安装maven https://maven.apache.org/download.cgi?Preferred=http%3A%2F%2Fmirrors.tuna.tsinghua.edu.cn%2Fapache%2F apache-maven-3.6.3-bin.tar.gz 设置环境变量 4.编译ooize cd oozie-4.3.1 mvn clean test -X ./bin/mkdistro.sh -DskipTests -Puber -Dhadoop.version=2.7.7 -Dhadoop.auth.version=2.7.7 -X 编译时间比较长,编译完后在oozie-4.3.1/distro/target/目录中生成 oozie-4.3.1-distro.tar.gz 5.配置hadoop ===>core-site.xml添加 <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> ===>mapred-site.xml添加 <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop02:19888</value> </property> ===>yarn-site.xml添加 <property> <name>yarn.log.server.url</name> <value>http://hadoop02:19888/jobhistory/logs/</value> </property> 将上面的配置scp到每台节点上 6.不用重启hdfs和yarn更新配置文件的命令 hdfs dfsadmin -refreshSuperUserGroupsConfiguration yarn rmadmin -refreshSuperUserGroupsConfiguration 启动jobhistory sbin/mr-jobhistory-daemon.sh start historyserver 7.拷贝oozie-4.3.1-distro.tar.gz tar -zxf oozie-4.3.1-distro.tar.gz -C /home/hadoop cd oozie-4.3.1 mkdir libext cd libext find /home/hadoop/hadoop-2.7.7/share/hadoop/ -name '*.jar' -exec cp {} . \; rm -rf jsp-api-2.1.jar 下载http://archive.cloudera.com/gplextras/misc/ext-2.2.zip 8.配置oozie的mysql数据库信息 ===>oozie-4.3.1/conf/oozie-site.xml <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> <description>a</description> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://192.168.15.45:3307/oozie</value> <description>b</description> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>root</value> <description>c</description> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>root</value> <description>d</description> </property> <property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/home/hadoop/hadoop-2.7.7/etc/hadoop/</value> <description>e</description> </property> 9.数据库初始化 现在mysql中创建好数据库oozie 在oozie-4.3.1目录下执行 bin/ooziedb.sh create -sqlfile oozie.sql -run 当前目录下生成一个oozie.sql文件 10.初始化oozie 上次oozie-4.3.1目录下面的oozie-sharelib-4.3.1.tar.gz到hdfs bin/oozie-setup.sh sharelib create -fs hdfs://hadoop01:9000 -locallib oozie-sharelib-4.3.1.tar.gz 11.打包项目,生成war包(报错没有安装zip) bin/oozie-setup.sh prepare-war 12.启动命令 bin/oozied.sh run 或者 bin/oozied.sh start 停止命令 bin/oozied.sh stop 13.验证命令 bin/oozie admin -oozie http://localhost:11000/oozie -status
14.目前3个容器模拟集群的进程情况 hadoop01(172.17.0.3) ================= Dzookeeper Dproc_journalnode Dproc_zkfc Dproc_namenode Dproc_resourcemanager oozie Dproc_historyserver ================= hadoop02(172.17.0.5) ================= Dzookeeper Dproc_journalnode Dproc_zkfc Dproc_datanode Dproc_namenode Dproc_nodemanager <----oozie的job由此执行 ================= hadoop03(172.17.0.7) ================= Dzookeeper Dproc_journalnode Dproc_datanode Dproc_nodemanager <----oozie的job由此执行 15.案例一:调用shell命令创建目录 在oozie-4.3.1目录下执行 mkdir -p oozie-apps/shell vi oozie-apps/shell/job.properties ======================================= nameNode=hdfs://hadoop01:9000 jobTracker=hadoop01:8032 queueName=default examplesRoot=oozie-apps oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell ======================================= vi oozie-apps/shell/workflow.xml ======================================= <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf"> <start to="shell-node"/> <action name="shell-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>mkdir</exec> <argument>/home/hadoop/xxxxxxyyyyyy</argument> <capture-output/> </shell> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> ======================================= 上传oozie-apps目录到hdfs hadoop fs -put oozie-apps/ /user/hadoop 执行任务 bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/shell/job.properties -run 15.案例二:调度执行多个job 在oozie-4.3.1目录下执行 mkdir oozie-apps/manyjob vi oozie-apps/manyjob/job.properties ======================================= nameNode=hdfs://hadoop01:9000 jobTracker=hadoop01:8032 queueName=default examplesRoot=oozie-apps oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/manyjob ======================================= vi oozie-apps/manyjob/workflow.xml ======================================= <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf"> <start to="p1-shell-node"/> <action name="p1-shell-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>mkdir</exec> <argument>/home/hadoop/xxxxxyyyyyzzzzz</argument> <capture-output/> </shell> <ok to="p2-shell-node"/> <error to="fail"/> </action> <action name="p2-shell-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>mkdir</exec> <argument>/home/hadoop/aaaaabbbbbccccc</argument> <capture-output/> </shell> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> ======================================= 上传manyjob目录到hdfs hadoop fs -put oozie-apps/manyjob /user/hadoop/oozie-apps 执行任务 bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/manyjob/job.properties -run 16.案例三:Oozie调度MapReduce任务 在oozie-4.3.1目录下执行 mkdir oozie-apps/map-reduce 创建存放jar的目录 mkdir oozie-apps/map-reduce/lib 拷贝jar,以hadoop的wordcount为例 cp /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar oozie-apps/map-reduce/lib 在hdfs上创建/input/文件夹,准备someworld.txt文本 hadoop fs -mkdir /input hadoop fs -put someword.txt /input/someword.txt vi oozie-apps/map-reduce/job.properties ======================================= nameNode=hdfs://hadoop01:9000 jobTracker=hadoop03:8032 queueName=default examplesRoot=oozie-apps #hdfs://hadoop02:8020/user/admin/oozie-apps/map-reduce/workflow.xml oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml outputDir=map-reduce ======================================= vi oozie-apps/map-reduce/workflow.xml ======================================= <workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/output/"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>mapreduce.job.output.key.class</name> <value>org.apache.hadoop.io.Text</value> </property> <property> <name>mapreduce.job.output.value.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property> <property> <name>mapred.input.dir</name> <value>/input/</value> </property> <property> <name>mapred.output.dir</name> <value>/output/</value> </property> <property> <name>mapreduce.job.map.class</name> <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value> </property> <property> <name>mapreduce.job.reduce.class</name> <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> ======================================= 上传map-reduce目录到hdfs hadoop fs -put oozie-apps/map-reduce /user/hadoop/map-reduce 执行任务 bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/map-reduce/job.properties -run 17.案例四:Oozie定时任务/循环任务 在oozie-4.3.1目录下执行 mkdir oozie-apps/cron 当前创建hello.sh脚本 ======================================= #!/bin/bash echo "hello " >> /home/hadoop/yyyyy.txt ======================================= vi oozie-apps/cron/job.properties ======================================= nameNode=hdfs://hadoop01:9000 jobTracker=hadoop03:8032 queueName=default examplesRoot=oozie-apps oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/cron start=2019-12-11T11:10+0800 end=2019-12-12T10:30+0800 workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/cron EXEC=hello.sh ======================================= vi oozie-apps/cron/coordinator.xml ======================================= <coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.2"> <action> <workflow> <app-path>${workflowAppUri}</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action> </coordinator-app> ======================================= vi oozie-apps/cron/workflow.xml ======================================= <workflow-app xmlns="uri:oozie:workflow:0.5" name="one-op-wf"> <start to="shell-node"/> <action name="shell-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>${EXEC}</exec> <file>/user/hadoop/oozie-apps/cron/${EXEC}#${EXEC}</file> <capture-output/> </shell> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> ======================================= 上传cron目录到hdfs hadoop fs -put oozie-apps/cron /user/hadoop/cron 执行任务 bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/cron/job.properties -run 18.配置OOZIE_URL $ export OOZIE_URL="http://localhost:11000/oozie" $ oozie job -info 14-20090525161321-oozie-tucu
19.javaAPI
配置jar包
=======================================
<dependency>
<groupId>org.apache.oozie</groupId>
<artifactId>oozie-client</artifactId>
<version>4.3.1</version>
</dependency>
=======================================
OozieDemo.java
=======================================
package com.kizzle.oozie;
import java.util.Properties;
import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.OozieClientException;
import org.apache.oozie.client.WorkflowJob;
public class OozieDemo {
public static void main(String[] args) throws OozieClientException, InterruptedException {
OozieClient wc = new OozieClient("http://hadoop01:11000/oozie");
Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, "hdfs://hadoop01:9000/user/hadoop/oozie-apps/map-reduce2");
conf.setProperty("outputDir", "/output33");
conf.setProperty("inputDir", "/input");
conf.setProperty("user.name", "hadoop");
conf.setProperty("jobTracker", "hadoop03:8032");
conf.setProperty("mapreduce.job.user.name", "hadoop");
conf.setProperty("nameNode", "hdfs://hadoop01:9000");
conf.setProperty("queueName", "default");
String jobId = wc.run(conf);
System.out.println("Workflow job submitted");
while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {
System.out.println("Workflow job running ...");
Thread.sleep(10 * 1000);
}
System.out.println("Workflow job completed ...");
System.out.println(wc.getJobInfo(jobId));
}
}
=======================================
javaAPI代替了job.properties,把配置信息放到了Properties对象中。
workflow.xml通过el表达式获取配置信息。