1. 程式人生 > >在Linux中通過Kitchen和Pan以命令列方式執行kettle的Job和Transformation

在Linux中通過Kitchen和Pan以命令列方式執行kettle的Job和Transformation

1. 準備工作

一個簡單的job,一個簡單的trans。

本處為了方便和效果易見,job和trans都生成檔案。

trans:讀取download目錄下的所有檔名,輸出為檔案。【介面情況下測試成功】
這裡寫圖片描述
成功生成目標檔案:
這裡寫圖片描述
job:建立檔案。【介面模式測試執行成功】
這裡寫圖片描述
執行結果:
這裡寫圖片描述
把介面執行測試結果檔案刪除,以免影響觀察。

2. linux環境以命令列方式執行job和trans

    Pan是用於執行trans的PDI命令列工具。
    Kitchen是用於執行作業的PDI命令列工具。

a. Pan的命令列選項和語法
語法:

        pan.sh -option=value arg1 arg2

命令列引數:

Switch Purpose
rep Enterprise or database repository name, if you are using one
user Repository username
pass Repository password
trans The name of the transformation (as it appears in the repository) to launch
dir The repository directory that contains the transformation, including the leading slash
file If you are calling a local KTR file, this is the filename, including the path if it is not in the local directory
level The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
logfile A local filename to write log output to
listdir Lists the directories in the specified repository
listtrans Lists the transformations in the specified repository directory
listrep Lists the available repositories
exprep Exports all repository objects to one XML file
norep Prevents Pan from logging into a repository. If you have set the KETTLE_REPOSITORY, KETTLE_USER, and KETTLE_PASSWORD environment variables, then this option will enable you to prevent Pan from logging into the specified repository, assuming you would like to execute a local KTR file instead.
safemode Runs in safe mode, which enables extra checking
version Shows the version, revision, and build date
param Set a named parameter in a name=value format. For example: -param:FOO=bar
listparam List information about the defined named parameters in the specified transformation.
maxloglines The maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows (default)
maxlogtimeout The maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely (default)

示例:

    sh pan.sh -rep=initech_pdi_repo -user=pgibbons -pass=lumburghsux -trans=TPS_reports_2011

本地trans呼叫示例:

./pan.sh -file=/home/hadoop/workplace/kettle/trans/test_cml.ktr -norep

b.Kitchen的命令列引數及語法:

語法與Pan一樣,引數有點不同。

Switch urpose
rep Enterprise or database repository name, if you are using one
user Repository username
pass Repository password
job The name of the job (as it appears in the repository) to launch
dir The repository directory that contains the job, including the leading slash
file If you are calling a local KJB file, this is the filename, including the path if it is not in the local directory
level The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
logfile A local filename to write log output to
listdir Lists the sub-directories within the specified repository directory
listjob Lists the jobs in the specified repository directory
listrep Lists the available repositories
export Exports all linked resources of the specified job. The argument is the name of a ZIP file.
norep Prevents Kitchen from logging into a repository. If you have set the KETTLE_REPOSITORY, KETTLE_USER, and KETTLE_PASSWORD environment variables, then this option will enable you to prevent Kitchen from logging into the specified repository, assuming you would like to execute a local KTR file instead.
version Shows the version, revision, and build date
param Set a named parameter in a name=value format. For example: -param:FOO=bar
listparam List information about the defined named parameters in the specified job.
maxloglines The maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows (default)
maxlogtimeout The maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely (default)

執行本地job的命令列語句:

    /home/kettle/data-integration/kitchen.sh -file=/home/kettle/transition/move.kjb -log=log.log

形式:

    $kitchen路徑 -file=$job路徑 log=$log路徑

呼叫pan結果:
這裡寫圖片描述
呼叫kitchen結果:
這裡寫圖片描述

3.個人常用命令選項

由於我當前的工作環境都是執行本地的job和trans檔案,所以常用的命令選項有:

命令 描述
-file job或trans檔案路徑
-norep 標明不是資源庫裡的檔案
-param 引數設定
-logfile log輸出檔名
-level log級別 (Basic, Detailed, Debug, Rowlevel, Error, Nothing)