1. 程式人生 > >Hive-shell指令碼形式執行HSQL

Hive-shell指令碼形式執行HSQL

Hive-指令碼形式提交hive查詢

接上文Mac-單機Hive安裝與測試,文中最後將本地檔案load到了hive 表中,下面對該表清洗,以shell指令碼形式提交,清洗所用的HSQL指令碼。

建立目標表

例如要清洗出的欄位如下表

use hive_test;
CREATE  EXTERNAL TABLE fact_XXX_log(
  time_stamp  string COMMENT '時間戳',
  dim1        string COMMENT 'dim1',
  type        string COMMENT '型別',
  query_id    string COMMENT 'query_id'
, error_stack string COMMENT '錯誤棧', trace_log string COMMENT 'trace_log', result_meta string COMMENT '結果meta', extend_info string COMMENT '擴充套件資訊' ) COMMENT '清洗日誌事實表' PARTITIONED BY ( `dt` string COMMENT '日' ) row format delimited fields terminated by '\t' STORED AS textfile;

清洗SQL指令碼

hive_test.sh

#!/bin/sh
echo "---- start exec hive sql"
hive -e "SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE hive_test.fact_XXX_log
PARTITION (dt)
    select
       xxxxxx 這裡省略了  
    from
       hive_test.system_log
;
echo "---- end hive sql"

本地命令列提交SQL指令碼

本地執行指令碼

/shell$ ls
hive_test.sh
@localhost:~/Documents/book/shell$ sh hive_test.sh

執行日誌如下

---- start exec hive sql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding 
。。。。
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:
Query ID = xxxx_20180104111556_9381fc76-f0ba-4420-8207-bd4ca3cec77a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1514527790034_0002, Tracking URL = http://localhost:8088/proxy/application_1514527790034_0002/
Kill Command = xxxxxxxp job  -kill job_1514527790034_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2018-01-04 11:16:26,465 Stage-1 map = 0%,  reduce = 0%
2018-01-04 11:16:33,038 Stage-1 map = 50%,  reduce = 0%
2018-01-04 11:16:37,179 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1514527790034_0002
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1514527790034_0003, Tracking URL = http://localhost:8088/proxy/application_1514527790034_0003/
Kill Command = xxxxxxxxadoop-2.7.4/bin/hadoop job  -kill job_1514527790034_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2018-01-04 11:17:10,803 Stage-3 map = 0%,  reduce = 0%
2018-01-04 11:17:17,681 Stage-3 map = 100%,  reduce = 0%
Ended Job = job_1514527790034_0003
Loading data to table hive_test.fact_XXX_log partition (dt=null)
Loaded : 1/1 partitions.
     Time taken to load dynamic partitions: 0.2 seconds
     Time taken for adding to write entity : 0.001 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2   HDFS Read: 30906 HDFS Write: 6438 SUCCESS
Stage-Stage-3: Map: 1   HDFS Read: 10016 HDFS Write: 6240 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 82.54 seconds
---- end hive sql

Hadoop web UI

這裡寫圖片描述