Oozie
Oozie简介
Oozie英文翻译为:驯象人。一个基于工作流引擎的开源框架,由Cloudera公司贡献给Apache,提供对Hadoop Mapreduce、Pig Jobs的任务调度与协调。Oozie需要部署到Java Servlet容器中运行。主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。
Oozie的功能模块介绍
模块
Workflow
顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)
Coordinator
定时触发workflow
Bundle Job
绑定多个Coordinator
常用节点
控制流节点(Control Flow Nodes)
控制流节点一般都是定义在工作流开始或者结束的位置,比如start,end,kill等。以及提供工作流的执行路径机制,如decision,fork,join等。
动作节点(Action Nodes)
负责执行具体动作的节点,比如:拷贝文件,执行某个Shell脚本等等。
Oozie的部署
部署
解压Oozip
1
| tar -zxvf /root/oozie-4.0.0-cdh5.3.6.tar.gz -C /usr/local/cdh
|
重新安装与Oozie兼容的hadoop
安装hadoop至/usr/local/cdh下
1
| tar -zxvf /root/hadoop-2.5.0-cdh5.3.6.tar.gz -C /usr/local/cdh
|
配置hadoop的基础环境,并更改环境变量vi /etc/profile source /etc/profile
==安装完hadoop配完基础环境后执行初始化命令“hadoop namenode format”前,必须重启集群。==
修改Hadoop配置
core-site.xml
1 2 3 4 5 6 7 8 9 10 11
| <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property>
<property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
|
提示:hadoop.proxyuser.root.hosts类似属性中的root用户替换成你的hadoop用户。
mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11
| <property> <name>mapreduce.jobhistory.address</name> <value>linux01:10020</value> </property>
<property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property>
|
提示:hadoop01即为hosts中ip做的映射名。
yarn-site.xml
1 2 3 4 5
| <property> <name>yarn.log.server.url</name> <value>http://linux01:19888/jobhistory/logs/</value> </property>
|
提示:配置完成后通过scp命令,将Oozie分发至其他机器。
重启Hadoop集群
1 2 3
| [root@hadoop01 ~]
[root@hadoop01 ~]
|
提示:还需要开启JobHistoryServer,再执行一个MR任务进行测试。
wordcount测试:
1
| yarn jar wordcount.jar day03.mapReduce.WordCount4 /wc/input/wc.txt /wc/out
|
结果:

在oozie根目录下解压hadooplibs
1
| tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../
|
完成后Oozie目录下会出现hadooplibs目录。
在Oozie根目录下创建libext目录
拷贝一些依赖的Jar包
将hadooplibs里面的jar包,拷贝到libext目录下:
1
| cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
|
拷贝Mysql驱动包到libext目录下:
1
| cp -a ~/softwares/installations/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar libext/
|
将ext-2.2.zip拷贝到libext/目录下
ext是一个js框架,用于展示oozie前端页面:
1
| cp -a ~/softwares/installations/cdh/ext-2.2.zip libext/
|
修改Oozie配置文件
oozie-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> <description> JDBC driver class. </description> </property>
<property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://hadoop01:3306/oozie</value> <description> JDBC URL. </description> </property>
<property> <name>oozie.service.JPAService.jdbc.username</name> <value>root</value> <description> DB user name. </description> </property>
<property> <name>oozie.service.JPAService.jdbc.password</name> <value>123456</value> <description> DB user password. </description> </property> <property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/usr/local/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop</value> <description> 让Oozie引用Hadoop的配置文件 </description> </property>
|
在Mysql中创建Oozie的数据库
1
| mysql> create database oozie;
|
初始化Oozie
上传Oozie目录下的yarn.tar.gz文件到HDFS:
1
| bin/oozie-setup.sh sharelib create -fs hdfs://hadoop01:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
|
执行成功之后,去50070检查对应目录有没有文件生成。
结果如下:

提示:yarn.tar.gz文件会自行解压
创建oozie.sql文件
1
| bin/oozie-setup.sh db create -run -sqlfile oozie.sql
|
结果:

打包项目,生成war包
1
| bin/oozie-setup.sh prepare-war
|

启动Oozie服务
访问web页面
hadoop01:11000
访问页面如下:


Oozie的使用
Oozie调度shell脚本 || Oozie逻辑调度执行多个Job
目标:
- 使用Oozie调度Shell脚本。
- 使用Oozie执行多个Job调度。
准备工作
1 2 3 4 5 6
| tar -zxvf oozie-examples.tar.gz
mkdir oozie-apps/
cp -r examples/apps/shell/ oozie-apps/
|
1 2
| [root@hadoop01 oozie-4.0.0-cdh5.3.6] hadoop fs -put shell/* /user/root/oozie-apps/shell/
|
1
| bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/shell/job.properties -run
|
写shell脚本
sh1.sh
1 2
| #!/bin/bash /sbin/ifconfig >> /root/p1.log
|
sh2.sh
1 2 3
| #!/bin/bash
echo "hello oozie" >> /root/p1.log
|
修改job.properties和workflow.xml
workflow.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf"> <start to="s1"/> <action name="s1"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>${EXEC1}</exec> <file>/user/root/oozie-apps/shell/${EXEC1} <capture-output/> </shell> <ok to="s2"/> <error to="fail"/> </action> <action name="s2"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>${EXEC2}</exec> <file>/user/root/oozie-apps/shell/${EXEC2} <capture-output/> </shell> <ok to="end"/> <error to="fail"/> </action>
<kill name="fail"> <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
|
job.properties
1 2 3 4 5 6 7 8 9
| nameNode=hdfs://hadoop01:9000 jobTracker=hadoop01:8032 queueName=default examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
EXEC1=sh1.sh EXEC2=sh2.sh
|
上传任务配置
1 2 3 4
| hadoop fs -mkdir -p /user/root/oozie-apps/shell
[root@hadoop01 oozie-4.0.0-cdh5.3.6]
|
执行任务
1 2 3
| [root@hadoop01 oozie-4.0.0-cdh5.3.6]
[root@hadoop01 oozie-4.0.0-cdh5.3.6]
|
结果如下图:
Oozie Web Console

mr-jobhistory-daemon.sh start historyserver

Oozie调度MapReduce任务
WordCount案例:
拷贝官方模板到oozie-apps
1
| cp -r examples/apps/map-reduce/ oozie-apps/
|
测试一下wordcount在yarn中的运行
1
| yarn jar wordcount.jar day03.mapReduce.WordCount4 /wc/input/wc.txt /wc/out
|
配置map-reduce**任务的job.properties**以及workflow.xml
job.properties
1 2 3 4 5 6 7 8 9
| nameNode=hdfs://hadoop01:9000 jobTracker=hadoop01:8032 queueName=default examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
inputDir=${nameNode}/wc/input/wc.txt outputDir=${nameNode}/wc/output
|
workflow.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
| <workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${outputDir}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property>
<property> <name>mapred.reducer.new-api</name> <value>true</value> </property>
<property> <name>mapreduce.job.output.key.class</name> <value>org.apache.hadoop.io.Text</value> </property>
<name>mapreduce.job.output.value.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property>
<property> <name>mapred.input.dir</name> <value>${inputDir}</value> </property>
<property> <name>mapred.output.dir</name> <value>${outputDir}</value> </property>
<property> <name>mapreduce.job.map.class</name> <value>day03.mapReduce.WordCountMapTask</value> </property>
<property> <name>mapreduce.job.reduce.class</name> <value>day03.mapReduce.WordCountReduceTask</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
|
拷贝待执行的jar包到map-reduce的lib目录下
1
| cp /root/WordCount.jar oozie-apps/map-reduce/lib
|
上传配置好的app文件夹到HDFS
1
| hadoop fs - put /oozie-apps/map-reduce /user/root/oozie-apps/
|
结果如下:

执行任务
1
| bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/map-reduce/job.properties -run
|
结果如下:



Oozie定时任务/循环任务
目标:Coordinator周期性调度任务
配置Linux时区以及时间服务器
检测系统当前时区:

时区为0800,则不用做调整。
配置oozie-site.xml
在configration标签中添加如下内容:
1 2 3 4 5
| <property> <name>oozie.processing.timezone</name> <value>GMT+0800</value> </property>
|
修改js框架中的关于时间设置的代码
1
| [root@hadoop01 oozie-4.0.0-cdh5.3.6]
|
将修改getTimeZone()函数
1 2 3 4 5 6 7 8 9 10
| #原函数 function getTimeZone() { Ext.state.Manager.setProvider(new Ext.state.CookieProvider()); return Ext.state.Manager.get("TimezoneId","GMT"); } #修改后 function getTimeZone() { Ext.state.Manager.setProvider(new Ext.state.CookieProvider()); return Ext.state.Manager.get("TimezoneId","GMT+0800"); }
|
重启oozie服务,浏览器清除缓存
1 2
| bin/oozied.sh stop bin/oozied.sh start
|

拷贝官方模板配置定时任务
1
| cp -r examples/apps/cron/ oozie-apps/
|
修改模板job.properties和coordinator.xml以及workflow.xml
job.properties
1 2 3 4 5 6 7 8 9 10 11 12
| nameNode=hdfs://hadoop01:9000 jobTracker=hadoop01:8023 queueName=default examplesRoot=oozie-apps
oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/cron start=2020-07-29T18:25+0800 end=2020-07-29T18:45+0800 workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/cron
EXEC1=p1.sh EXEC2=p2.sh
|
coordinator.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| <coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.2"> <action> <workflow> <app-path>${workflowAppUri}</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action> </coordinator-app>
|
workflow.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| <workflow-app xmlns="uri:oozie:workflow:0.5" name="one-op-wf"> <start to="action1"/> <action name="action1"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>${EXEC1}</exec> <file>/user/root/oozie-apps/cron/${EXEC1}#${EXEC1}</file> <capture-output/> </shell> <ok to="action2"/> <error to="fail"/> </action> <action name="action2"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>${EXEC2}</exec> <file>/user/root/oozie-apps/cron/${EXEC2}#${EXEC2}</file> <capture-output/> </shell> <ok to="end"/> <error to="fail"> <kill name="fail"> <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
|
上传配置
1
| hadoop fs -put oozie-apps/cron/ /user/root/oozie-apps/
|
执行任务
1
| bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/cron/job.properties -run
|
未到开始时间:

到started时间:
杀死任务:
1
| bin/oozie job -oozie http://hadoop01:11000/oozie -kill 0000000-200730140126921-oozie-root-W
|