1. 程式人生 > >大數據基礎之Oozie(2)常見問題

大數據基礎之Oozie(2)常見問題

lns odi dir 大數據基礎 dep rect false tar cat

1 oozie如何查看任務日誌?

通過oozie job id可以查看流程詳細信息,命令如下:

oozie job -info 0012077-180830142722522-oozie-hado-W

流程詳細信息如下:

Job ID : 0012077-180830142722522-oozie-hado-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name : $workflow_name

App Path : hdfs://$hdfs_name/oozie/wf/$workflow_name.xml

Status : KILLED

Run : 0

User : hadoop

Group : -

Created : 2018-09-25 02:51 GMT

Started : 2018-09-25 02:51 GMT

Last Modified : 2018-09-25 02:53 GMT

Ended : 2018-09-25 02:53 GMT

CoordAction ID: -

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID Status Ext ID Ext Status Err Code

------------------------------------------------------------------------------------------------------------------------------------

0012077-180830142722522-oozie-hado-W@:start: OK - OK -

------------------------------------------------------------------------------------------------------------------------------------

0012077-180830142722522-oozie-hado-W@$action_name ERROR application_1537326594090_5663FAILED/KILLEDJA018

------------------------------------------------------------------------------------------------------------------------------------

0012077-180830142722522-oozie-hado-W@Kill OK - OK E0729

------------------------------------------------------------------------------------------------------------------------------------

失敗的任務定義如下

<action name="$action_name">

<spark xmlns="uri:oozie:spark-action:0.1">

<job-tracker>${job_tracker}</job-tracker>

<name-node>${name_node}</name-node>

<master>${jobmaster}</master>

<mode>${jobmode}</mode>

<name>${jobname}</name>

<class>${jarclass}</class>

<jar>${jarpath}</jar>

<spark-opts>${sparkopts}</spark-opts>

</spark>

在yarn上可以看到application_1537326594090_5663對應的application如下

application_1537326594090_5663 hadoop oozie:launcher:T=spark:W=$workflow_name:A=$action_name:ID=0012077-180830142722522-oozie-hado-W Oozie Launcher

查看application_1537326594090_5663日誌發現

2018-09-25 10:52:05,237 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1537326594090_5664

yarn上application_1537326594090_5664對應的application如下

application_1537326594090_5664 hadoop $app_name SPARK

即application_1537326594090_5664才是Action對應的spark任務,為什麽中間會多一步,

簡要來說,Oozie執行Action時,即ActionExecutor(最主要的子類是JavaActionExecutor,hive、spark等action都是這個類的子類),JavaActionExecutor首先會提交一個LauncherMapper(map任務)到yarn,其中會執行LauncherMain(具體的action是其子類,比如JavaMain、SparkMain等),spark任務會執行SparkMain,在SparkMain中會調用org.apache.spark.deploy.SparkSubmit來提交任務

2 oozie提交spark任務如何添加依賴?

spark任務添加依賴的方式:

如果是local方式運行,可以通過--jars來添加依賴;

如果是yarn方式運行,可以通過spark.yarn.jars來添加依賴;

這兩種方式在oozie上都行不通,首先oozie上沒辦法也不應該通過local運行,其次通過spark.yarn.jars方式配置你會發現根本不會生效,來看為什麽

查看LauncherMapper的日誌(可見上述問題1)

Spark Version 2.1.1

Spark Action Main class : org.apache.spark.deploy.SparkSubmit

Oozie Spark action configuration

=================================================================

...

--conf

spark.yarn.jars=hdfs://$hdfs_name/spark/sparkjars/*.jar

--conf

spark.yarn.jars=hdfs://$hdfs_name/oozie/share/lib/lib_20180801121138/spark/spark-yarn_2.11-2.1.1.jar

可見oozie會自己添加一個新的spark.yarn.jars配置,如果提供兩個相同的key,spark會如何處理

org.apache.spark.deploy.SparkSubmit

val appArgs = new SparkSubmitArguments(args)

org.apache.spark.launcher.SparkSubmitOptionParser

if (!handle(name, value)) {

org.apache.spark.deploy.SparkSubmitArguments

override protected def handle(opt: String, value: String): Boolean = {

...

case CONF =>

value.split("=", 2).toSeq match {

case Seq(k, v) => sparkProperties(k) = v

case _ => SparkSubmit.printErrorAndExit(s"Spark config without ‘=‘: $value")

}

可見會直接覆蓋,使用最後一個配置,即oozie的配置,而不是應用自己提供的配置,這樣就需要應用自己將特殊依賴打包到應用jar中,具體使用maven的maven-assembly-plugin,配置其中的<dependencySets><dependencySet><includes><include>,詳細配置如下:

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

<!-- TODO: a jarjar format would be better -->

<id>jar-with-dependencies</id>

<formats>

<format>jar</format>

</formats>

<includeBaseDirectory>false</includeBaseDirectory>

<dependencySets>

<dependencySet>

<outputDirectory>/</outputDirectory>

<useProjectArtifact>true</useProjectArtifact>

<unpack>true</unpack>

<scope>runtime</scope>

<includes>

<include>redis.clients:jedis</include>

<include>org.apache.commons:commons-pool2</include>

</includes>

</dependencySet>

</dependencySets>

</assembly>

這裏只是將默認提供的jar-with-dependencies.xml內容拷貝出來添加includes配置;

大數據基礎之Oozie(2)常見問題