azkaban创建工作

一个完整的数据分析系统通常都是由大量任务单元组成，shell脚本程序，java程序，mapreduce程序、hive脚本等，各任务单元之间存在时间先后及前后依赖关系，为了很好地组织起这样的复杂执行计划，需要一个工作流调度系统来调度执行；

1.简单job

#创建job描述文件
vi command.job
#command.job
type=command                                                    
command=echo 'hello'

将job资源文件打包成.zip格式
zip command.job

#在web页面创建工程并提交job，可设置定时执行和立刻执行

2.多job依赖

#第一个job
# foo.job
type=command
command=echo foo

#第二个job
# bar.job
type=command
dependencies=foo
command=echo bar

#打包上传执行

3.HDFS相关的任务

#描述文件
# fs.job
type=command
command=/root/hadoop/bin/hadoop fs -mkdir /test

#打包上传执行

4.MapReduce任务

#描述文件
# mrwc.job
type=command
command=/root/hadoop/bin/hadoop  jar hadoop-mapreduce-examples-2.6.1.jar wordcount /wordcount/input /wordcount/output

#打包时要将hadoop-mapreduce-examples-2.6.1.jar一起和描述文件打包

5.脚本任务

#写脚本文件test.sql
use default;
drop table aztest;
create table aztest(id int,name string) row format delimited fields terminated by ',';
load data inpath '/aztest/hiveinput' into table aztest;
create table azres as select * from aztest;
insert overwrite directory '/aztest/hiveoutput' select count(1) from aztest;

#写job描述文件
# hivef.job
type=command
command=/home/apps/hive/bin/hive -f 'test.sql'

posted @ 2018-06-05 14:06 py小杰阅读(347) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

昵称： py小杰
园龄： 7年
粉丝： 20
关注： 3

2025年3月

日

一

二

三

四

五

六