1. 程式人生 > >【轉】azkaban的部署過程中遇到的一些坑(部署篇)

【轉】azkaban的部署過程中遇到的一些坑(部署篇)

注:azkaban之前有個配置檔案預設要求6G以上可用記憶體的坑,解決完之後今天又遇到了程式碼寫死3G以上記憶體的坑,根據報錯資訊正巧搜到了這篇文章,另外作者的主頁https://my.oschina.net/u/2988360裡也有其他幾篇關於azkaban的文章,推薦

1.azkaban原始碼下載

2.azkaban的安裝部署

下載完成MyAzkaban專案後,裡面有一份部署文件“MyAzkaban-3.0.0使用文件.doc”,參照著該文件進行操作

安裝完成後輸入一下網址進行訪問:https://ip:8443

3.部署過程中可能會遇到的一些坑

在進行專案部署的時候,遇到了一些坑,花了很長時間才解決,這邊分享給大家,希望大家在部署的時候能夠少走一些彎路

3.1官網專案非maven專案

官方提供的原始碼並不是maven專案,不支援maven編譯及打包構建,如果想採用maven進行構建,則通過上面的第一個原始碼連結進行下載

3.2 安裝完進行啟動時候的坑

安裝完成之後,一定要在bin檔案的上一層目錄進行啟動

./bin/start-web.sh

而不能cd到bin目錄裡面進行啟動,因為該啟動指令碼中引用到了當前位置目錄資訊

3.3 啟動指令碼可執行許可權設定

啟動指令碼上傳至伺服器中預設是不具備可執行許可權的,所以需要授予可執行許可權

sudo chmod 755 xxx.sh

3.4 window和linux作業系統空格問題處理

3.5 Multiple Executor Mode模式配置配置對executor主機記憶體限制

azkaban.use.multiple.executors=true
//execute主機過濾器配置
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus

其中MinimumFreeMemory過濾器會檢查executor主機空餘記憶體是否會大於6G,如果不足6G,則web-server不會將任務交由該主機執行,具體原始碼如下:

private static final int MINIMUM_FREE_MEMORY = 6 * 1024;


/**<pre>
   * function to register the static Minimum Reserved Memory filter.
   * NOTE : this is a static filter which means the filter will be filtering based on the system standard which is not
   *        Coming for the passed flow.
   *        This filter will filter out any executors that has the remaining  memory below 6G
   *</pre>
   * */
  private static FactorFilter<Executor, ExecutableFlow> getMinimumReservedMemoryFilter(){
    return FactorFilter.create(MINIMUMFREEMEMORY_FILTER_NAME, new FactorFilter.Filter<Executor, ExecutableFlow>() {
      private static final int MINIMUM_FREE_MEMORY = 6 * 1024;
      public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) {
        if (null == filteringTarget){
          logger.debug(String.format("%s : filtering out the target as it is null.", MINIMUMFREEMEMORY_FILTER_NAME));
          return false;
        }

        ExecutorInfo stats = filteringTarget.getExecutorInfo();
        if (null == stats) {
          logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.",
              MINIMUMFREEMEMORY_FILTER_NAME,
              filteringTarget.toString()));
          return false;
        }
        return stats.getRemainingMemoryInMB() > MINIMUM_FREE_MEMORY ;
       }
    });
  }

CpuStatus過濾器會判斷執行主機的cpu佔用率是否達到95%,若達到95%,web-server也不會將任務交給該主機執行

 /**
   * <pre>
   * function to register the static Minimum Reserved Memory filter.
   * NOTE :  this is a static filter which means the filter will be filtering based on the system standard which
   *        is not Coming for the passed flow.
   *        This filter will filter out any executors that the current CPU usage exceed 95%
   * </pre>
   * */
  private static FactorFilter<Executor, ExecutableFlow> getCpuStatusFilter(){
    return FactorFilter.create(CPUSTATUS_FILTER_NAME, new FactorFilter.Filter<Executor, ExecutableFlow>() {
      private static final int MAX_CPU_CURRENT_USAGE = 95;
      public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) {
        if (null == filteringTarget){
          logger.debug(String.format("%s : filtering out the target as it is null.", CPUSTATUS_FILTER_NAME));
          return false;
        }

        ExecutorInfo stats = filteringTarget.getExecutorInfo();
        if (null == stats) {
          logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.",
              MINIMUMFREEMEMORY_FILTER_NAME,
              filteringTarget.toString()));
          return false;
        }
        return stats.getCpuUsage() < MAX_CPU_CURRENT_USAGE ;
       }
    });
  }

3.6 任務執行申請不到記憶體

如果任務執行失敗,報錯資訊如下

14-09-2017 13:50:01 CST A INFO - Starting job A at 1505368201283
14-09-2017 13:50:01 CST A INFO - azkaban.webserver.url property was not set
14-09-2017 13:50:01 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=184 -Dazkaban.jobid=A
14-09-2017 13:50:01 CST A INFO - Building command job executor. 
14-09-2017 13:50:01 CST A ERROR - pluginLoadProps is null
14-09-2017 13:50:01 CST A ERROR - Job run failed!
java.lang.Exception: Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A
	at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:86)
	at azkaban.execapp.JobRunner.runJob(JobRunner.java:590)
	at azkaban.execapp.JobRunner.run(JobRunner.java:443)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
14-09-2017 13:50:01 CST A ERROR - Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A cause: null
14-09-2017 13:50:01 CST A INFO - Finishing job A attempt: 0 at 1505368201336 with status FAILED

多半是因為所有執行主機記憶體不足引起,azkaban原始碼要求執行主機可用記憶體必須大於3G才能滿足執行任務的條件

azkaban對應的原始碼如下:

 private static final long LOW_MEM_THRESHOLD = 3L*1024L*1024L; //3 GB

/**
   * @param xms
   * @param xmx
   * @return System can satisfy the memory request or not
   * 
   * Given Xms/Xmx values (in kb) used by java process, determine if system can
   * satisfy the memory request
   */
  public synchronized static boolean canSystemGrantMemory(long xms, long xmx, long freeMemDecrAmt) {
    if (!memCheckEnabled) {
      return true;
    }

    //too small amount of memory left, reject
    if (freeMemAmount < LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb),  memory request declined.",
              freeMemAmount, LOW_MEM_THRESHOLD));
      return false;
    }

    //let's get newest mem info
    if (freeMemAmount >= LOW_MEM_THRESHOLD && freeMemAmount < 2 * LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount (%d kb) is less than 2x low mem threshold (%d kb),  re-read /proc/meminfo",
              freeMemAmount, LOW_MEM_THRESHOLD));
      readMemoryInfoFile();
    }

    //too small amount of memory left, reject
    if (freeMemAmount < LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb),  memory request declined.",
              freeMemAmount, LOW_MEM_THRESHOLD));
      return false;
    }

    if (freeMemAmount - xmx < LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount minus xmx (%d - %d kb) is less than low mem threshold (%d kb),  memory request declined.",
              freeMemAmount, xmx, LOW_MEM_THRESHOLD));
      return false;
    }

    if (freeMemDecrAmt > 0) {
      freeMemAmount -= freeMemDecrAmt;
      logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", freeMemDecrAmt, freeMemAmount));
    } else {
      freeMemAmount -= xms;
      logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", xms, freeMemAmount));
    }
    
    return true;
  }

3.7 Multiple Executor Mode模式部署目前還不支援主機及埠對應關係配置

Multiple Executor Mode模式部署目前還不支援主機及埠對應關係配置,所以需要手動執行sql往資料庫表中插入資料

insert into executors(host,port) values("EXECUTOR_PORT",EXECUTOR_PORT);

4.原始碼包在windos中直接編譯(本地需要安裝git客戶端)

1.window命令列切換到目標目錄 2.git clone https://github.com/azkaban/azkaban 3.下載完成後 執行gradlew build -x test命令構建(跳過測試) 4.構建成功後找到server以及executor的buit目錄的distributions目錄下

5.azkaban3.35版本資訊中報錯問題解決

5.1 Missing required property 'azkaban.native.lib'報錯解決

    報錯資訊如下:

16-09-2017 19:48:28 CST A INFO - Starting job A at 1505562508575
16-09-2017 19:48:28 CST A INFO - azkaban.webserver.url property was not set
16-09-2017 19:48:28 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=1 -Dazkaban.jobid=A
16-09-2017 19:48:28 CST A INFO - Building command job executor. 
16-09-2017 19:48:28 CST A INFO - Memory granted for job A
16-09-2017 19:48:28 CST A INFO - 2 commands to execute.
16-09-2017 19:48:28 CST A INFO - cwd=/app/azkaban/source_buit/azkaban-exec-server-3.35.0/executions/1
16-09-2017 19:48:28 CST A INFO - effective user is: azkaban
16-09-2017 19:48:28 CST A ERROR - Job run failed!
azkaban.utils.UndefinedPropertyException: Missing required property 'azkaban.native.lib'
	at azkaban.utils.Props.getString(Props.java:420)
	at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:234)
	at azkaban.execapp.JobRunner.runJob(JobRunner.java:748)
	at azkaban.execapp.JobRunner.doRun(JobRunner.java:591)
	at azkaban.execapp.JobRunner.run(JobRunner.java:552)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
16-09-2017 19:48:28 CST A ERROR - Missing required property 'azkaban.native.lib' cause: null
16-09-2017 19:48:28 CST A INFO - Finishing job A at 1505562508845 with status FAILED

解決方案:

配置commonprivate.properties

5.2 介面樣式問題處理

切換完最新原始碼(3.35.0)進行打包後,部署出來的介面存在樣式問題

出現的原因,伺服器中web-server目錄下面的web資料夾我拷貝的是下面的目錄

該目錄下面並沒有azkaban.css樣式檔案

所以出現了樣式問題

解決辦法:

使用編譯後install目錄下的web檔案上傳至伺服器

配置完成後重新啟動,介面展示正常:

說明:

Azkaban中的每個job都是一個程序,在Azkaban中判斷job成功與否是根據這個程序是否成功執行完成,但是在MR 或者Spark Job執行的過程中,如果程式碼出錯,執行在叢集上的任務會停止,並不會有內容寫入目標檔案中,此時返回給Azkaban的程序是執行成功的,也就是job節點執行成功。這與任務執行的結果相悖。

例如:

在執行某個jar包的過程中時,出現了NullPointException,此時MR作業停止,但是最終Process 顯示的為執行成功。並且節點最終執行的結果也為成功:

所以為了防止依賴的節點出現錯誤,其以下節點仍可執行的情況。需要換一個校驗job是否正確執行的維度進行評判,比如檢測MR 或者 Spark 任務的log檔案是否正確執行等,或者檢測叢集中的任務是否執行成功。

總結:在執行結束後可以返回hdfs中查詢是否有對應的檔案生成,如果有則表示成功,沒有則表示失敗