我有一壶酒,足以慰平生。

0%

oozie学习笔记

Oozie

Oozie简介

Oozie英文翻译为:驯象人。一个基于工作流引擎的开源框架,由Cloudera公司贡献给Apache,提供对Hadoop Mapreduce、Pig Jobs的任务调度与协调。Oozie需要部署到Java Servlet容器中运行。主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。

Oozie的功能模块介绍

模块

  1. Workflow

    顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)

  2. Coordinator

    定时触发workflow

  3. Bundle Job

    绑定多个Coordinator

常用节点

  1. 控制流节点(Control Flow Nodes)

    控制流节点一般都是定义在工作流开始或者结束的位置,比如start,end,kill等。以及提供工作流的执行路径机制,如decision,fork,join等。

  2. 动作节点(Action Nodes)

    负责执行具体动作的节点,比如:拷贝文件,执行某个Shell脚本等等。

Oozie的部署

部署

解压Oozip

1
tar -zxvf /root/oozie-4.0.0-cdh5.3.6.tar.gz -C /usr/local/cdh

重新安装与Oozie兼容的hadoop

安装hadoop至/usr/local/cdh下

1
tar -zxvf /root/hadoop-2.5.0-cdh5.3.6.tar.gz -C /usr/local/cdh

配置hadoop的基础环境,并更改环境变量vi /etc/profile source /etc/profile

==安装完hadoop配完基础环境后执行初始化命令“hadoop namenode format”前,必须重启集群。==

修改Hadoop配置

core-site.xml
1
2
3
4
5
6
7
8
9
10
11
<!-- Oozie Server的Hostname -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>

<!-- 允许被Oozie代理的用户组 -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>

提示:hadoop.proxyuser.root.hosts类似属性中的root用户替换成你的hadoop用户。

mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>linux01:10020</value>
</property>

<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>

提示:hadoop01即为hosts中ip做的映射名。

yarn-site.xml
1
2
3
4
5
<!-- 任务历史服务 -->
<property>
<name>yarn.log.server.url</name>
<value>http://linux01:19888/jobhistory/logs/</value>
</property>

提示:配置完成后通过scp命令,将Oozie分发至其他机器。

重启Hadoop集群

1
2
3
[root@hadoop01 ~]# start-all.sh
#开启JobHistoryServer
[root@hadoop01 ~]# mr-jobhistory-daemon.sh start historyserver

提示:还需要开启JobHistoryServer,再执行一个MR任务进行测试。

wordcount测试:

1
yarn jar wordcount.jar day03.mapReduce.WordCount4 /wc/input/wc.txt /wc/out

结果:

image-20200727202510989

在oozie根目录下解压hadooplibs

1
tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../

完成后Oozie目录下会出现hadooplibs目录。

在Oozie根目录下创建libext目录

1
mkdir libext/

拷贝一些依赖的Jar包

  1. 将hadooplibs里面的jar包,拷贝到libext目录下:

    1
    cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
  2. 拷贝Mysql驱动包到libext目录下:

    1
    cp -a ~/softwares/installations/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar libext/

将ext-2.2.zip拷贝到libext/目录下

ext是一个js框架,用于展示oozie前端页面:

1
cp -a ~/softwares/installations/cdh/ext-2.2.zip libext/

修改Oozie配置文件

oozie-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
<description>
JDBC driver class.
</description>
</property>

<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://hadoop01:3306/oozie</value>
<description>
JDBC URL.
</description>
</property>

<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>root</value>
<description>
DB user name.
</description>
</property>

<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>123456</value>
<description>
DB user password.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/usr/local/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop</value>
<description>
让Oozie引用Hadoop的配置文件
</description>
</property>

在Mysql中创建Oozie的数据库

1
mysql> create database oozie;

初始化Oozie

上传Oozie目录下的yarn.tar.gz文件到HDFS:
1
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop01:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz

执行成功之后,去50070检查对应目录有没有文件生成。

结果如下:

image-20200728093841672

提示:yarn.tar.gz文件会自行解压

创建oozie.sql文件
1
bin/oozie-setup.sh db create -run -sqlfile oozie.sql

结果:

image-20200727222544637

打包项目,生成war包
1
bin/oozie-setup.sh prepare-war

image-20200728094944286

启动Oozie服务
1
bin/oozied.sh start
访问web页面

hadoop01:11000

访问页面如下:

image-20200728104328977

image-20200728104312235

Oozie的使用

Oozie调度shell脚本 || Oozie逻辑调度执行多个Job

目标:

  1. 使用Oozie调度Shell脚本。
  2. 使用Oozie执行多个Job调度。

准备工作

1
2
3
4
5
6
#解压官方案例模板
tar -zxvf oozie-examples.tar.gz
#创建工作目录
mkdir oozie-apps/
#拷贝任务模板到oozie-apps/目录
cp -r examples/apps/shell/ oozie-apps/
1
2
[root@hadoop01 oozie-4.0.0-cdh5.3.6]# hadoop fs -put oozie-apps/shell/*  /user/root/oozie-apps/shell/
hadoop fs -put shell/* /user/root/oozie-apps/shell/
1
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/shell/job.properties -run

写shell脚本

sh1.sh

1
2
#!/bin/bash
/sbin/ifconfig >> /root/p1.log

sh2.sh

1
2
3
#!/bin/bash

echo "hello oozie" >> /root/p1.log

修改job.properties和workflow.xml

workflow.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="s1"/>
<action name="s1">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC1}</exec>
<file>/user/root/oozie-apps/shell/${EXEC1}#${EXEC1}</file>
<capture-output/>
</shell>
<ok to="s2"/>
<error to="fail"/>
</action>
<action name="s2">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC2}</exec>
<file>/user/root/oozie-apps/shell/${EXEC2}#${EXEC2}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>

<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

job.properties

1
2
3
4
5
6
7
8
9
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop01:8032
queueName=default
examplesRoot=oozie-apps

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell

EXEC1=sh1.sh
EXEC2=sh2.sh

上传任务配置

1
2
3
4
#在hdfs创建任务目录
hadoop fs -mkdir -p /user/root/oozie-apps/shell
#上传任务配置至任务目录下
[root@hadoop01 oozie-4.0.0-cdh5.3.6]# hadoop fs -put oozie-apps/shell/* /user/root/oozie-apps/shell/

执行任务

1
2
3
[root@hadoop01 oozie-4.0.0-cdh5.3.6]#bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/shell/job.properties -run
#杀死某个任务
[root@hadoop01 oozie-4.0.0-cdh5.3.6]#bin/oozie job -oozie http://hadoop01:11000/oozie -kill 0000012-200728061816337-oozie-root-W

结果如下图:

Oozie Web Console

image-20200728174107804

mr-jobhistory-daemon.sh start historyserver

image-20200728174310143

Oozie调度MapReduce任务

WordCount案例:

拷贝官方模板到oozie-apps
1
cp -r examples/apps/map-reduce/ oozie-apps/
测试一下wordcount在yarn中的运行
1
yarn jar wordcount.jar day03.mapReduce.WordCount4 /wc/input/wc.txt /wc/out
配置map-reduce**任务的job.properties**以及workflow.xml

job.properties

1
2
3
4
5
6
7
8
9
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop01:8032
queueName=default
examplesRoot=oozie-apps

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml

inputDir=${nameNode}/wc/input/wc.txt
outputDir=${nameNode}/wc/output

workflow.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<!-- 配置调度MR任务时,使用新的API -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>

<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>

<!-- 指定Job Key输出类型 -->
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>

<!-- 指定Job Value输出类型 -->
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>

<!-- 指定输入路径 -->
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>

<!-- 指定输出路径 -->
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>

<!-- 指定Map类 -->
<property>
<name>mapreduce.job.map.class</name>
<value>day03.mapReduce.WordCountMapTask</value>
</property>

<!-- 指定Reduce类 -->
<property>
<name>mapreduce.job.reduce.class</name>
<value>day03.mapReduce.WordCountReduceTask</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
拷贝待执行的jar包到map-reduce的lib目录下
1
cp /root/WordCount.jar oozie-apps/map-reduce/lib
上传配置好的app文件夹到HDFS
1
hadoop fs - put /oozie-apps/map-reduce /user/root/oozie-apps/

结果如下:

image-20200729163430742

执行任务
1
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/map-reduce/job.properties -run

结果如下:

image-20200729163515451

image-20200729163536999

image-20200729163557143

Oozie定时任务/循环任务

目标:Coordinator周期性调度任务

配置Linux时区以及时间服务器

检测系统当前时区:

1
date -R

image-20200729163937061

时区为0800,则不用做调整。

配置oozie-site.xml

在configration标签中添加如下内容:

1
2
3
4
5
<!-- 设置时区 -->
<property>
<name>oozie.processing.timezone</name>
<value>GMT+0800</value>
</property>

修改js框架中的关于时间设置的代码

1
[root@hadoop01 oozie-4.0.0-cdh5.3.6]# vi oozie-server/webapps/oozie/oozie-console.js

​ 将修改getTimeZone()函数

1
2
3
4
5
6
7
8
9
10
#原函数
function getTimeZone() {
Ext.state.Manager.setProvider(new Ext.state.CookieProvider());
return Ext.state.Manager.get("TimezoneId","GMT");
}
#修改后
function getTimeZone() {
Ext.state.Manager.setProvider(new Ext.state.CookieProvider());
return Ext.state.Manager.get("TimezoneId","GMT+0800");
}

重启oozie服务,浏览器清除缓存

1
2
bin/oozied.sh stop
bin/oozied.sh start

image-20200729173524940

拷贝官方模板配置定时任务

1
cp -r examples/apps/cron/ oozie-apps/

修改模板job.properties和coordinator.xml以及workflow.xml

job.properties

1
2
3
4
5
6
7
8
9
10
11
12
nameNode=hdfs://hadoop01:9000
jobTracker=hadoop01:8023
queueName=default
examplesRoot=oozie-apps

oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/cron
start=2020-07-29T18:25+0800
end=2020-07-29T18:45+0800
workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/cron

EXEC1=p1.sh
EXEC2=p2.sh

coordinator.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

workflow.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<workflow-app xmlns="uri:oozie:workflow:0.5" name="one-op-wf">
<start to="action1"/>
<action name="action1">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC1}</exec>
<file>/user/root/oozie-apps/cron/${EXEC1}#${EXEC1}</file>
<capture-output/>
</shell>
<ok to="action2"/>
<error to="fail"/>
</action>
<action name="action2">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC2}</exec>
<file>/user/root/oozie-apps/cron/${EXEC2}#${EXEC2}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail">
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

上传配置

1
hadoop fs -put oozie-apps/cron/ /user/root/oozie-apps/

执行任务

1
bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie-apps/cron/job.properties -run

未到开始时间:

image-20200729182515081

到started时间:

杀死任务:

1
bin/oozie job -oozie http://hadoop01:11000/oozie -kill 0000000-200730140126921-oozie-root-W
您的支持是我继续创作的动力