Hive：開發中使用變數的兩種方法

阿新 • • 發佈：2018-12-16

在使用hive開發資料分析程式碼時，經常會遇到需要改變執行引數的情況，比如select語句中對日期欄位值的設定，可能不同時間想要看不同日期的資料，這就需要能動態改變日期的值。如果開發量較大、引數多的話，使用變數來替代原來的字面值非常有必要，本文總結了幾種可以向hive的SQL中傳入引數的方法，以滿足類似的需要。

準備測試表和測試資料

第一步先準備測試表和測試資料用於後續測試：

hive> create database test;
OK
Time taken: 2.606 seconds

然後執行建表和匯入資料的sql檔案：

[[email protected] 
 testHivePara]$ hive -f student.sql 
Hive history file=/tmp/crazyant.net/hive_job_log_czt_201309131615_1720869864.txt
OK
Time taken: 2.131 seconds
OK
Time taken: 0.878 seconds
Copying data from file:/home/users/czt/testdata_student
Copying file: file:/home/users/czt/testdata_student
Loading data to table test.student
OK
Time taken: 1.76 seconds

其中student.sql內容如下：

use test; 

---學生資訊表
create table IF NOT EXISTS student(
    sno        bigint    comment '學號' , 
    sname    string    comment '姓名' , 
    sage    bigint    comment '年齡' ,
    pdate    string    comment '入學日期'
)
COMMENT '學生資訊表'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH 
    '/home/users/czt/testdata_student'
INTO TABLE student;

testdata_student測試資料檔案內容如下：

1    name1    21    20130901
2    name2    22    20130901
3    name3    23    20130901
4    name4    24    20130901
5    name5    25    20130902
6    name6    26    20130902
7    name7    27    20130902
8    name8    28    20130902
9    name9    29    20130903
10    name10    30    20130903
11    name11    31    20130903
12    name12    32    20130904
13    name13    33    20130904

方法1：shell中設定變數，hive -e中直接使用

測試的shell檔名：

#!/bin/bash
tablename="student"
limitcount="8"

hive -S -e "use test; select * from ${tablename} limit ${limitcount};"

執行結果：

[[email protected] testHivePara]$ sh -x shellhive.sh 
+ tablename=student
+ limitcount=8
+ hive -S -e 'use test; select * from student limit 8;'
1       name1    21      20130901
2       name2    22      20130901
3       name3    23      20130901
4       name4    24      20130901
5       name5    25      20130902
6       name6    26      20130902
7       name7    27      20130902
8       name8    28      20130902

由於hive自身是類SQL語言，缺乏shell的靈活性和對過程的控制能力，所以採用shell+hive的開發模式非常常見，在shell中直接定義變數，在hive -e語句中就可以直接引用；

注意：使用-hiveconf定義，在hive -e中是不能使用的

修改一下剛才的shell檔案，採用-hiveconf的方法定義日期引數：

#!/bin/bash
tablename="student"
limitcount="8"

hive -S \
    -hiveconf enter_school_date="20130902" \
    -hiveconf min_age="26" \
    -e \
    "    use test; \
        select * from ${tablename} \
        where \ 
            pdate='${hiveconf:enter_school_date}' \ 
            and \ 
            sage>'${hiveconf:min_age}' \ 
        limit ${limitcount};"

執行會失敗，因為該指令碼在shell環境中執行的，於是shell試圖去解析${hiveconf:enter_school_date}和${hiveconf:min_age}變數，但是這兩個SHELL變數並沒有定義，所以會以空字串放在這個位置。

執行時該SQL語句會被解析成下面這個樣子：

+ hive -S -hiveconf enter_school_date=20130902 -hiveconf min_age=26 -e 'use test; explain select * from student where pdate='\'''\'' and sage>'\'''\'' limit 8;'

方法2：使用-hiveconf定義，在SQL檔案中使用

因為換行什麼的很不方便，hive -e只適合寫少量的SQL程式碼，所以一般都會寫很多hql檔案，然後使用hive –f的方法來呼叫，這時候可以通過-hiveconf定義一些變數，然後在SQL中直接使用。

先編寫呼叫的SHELL檔案：

#!/bin/bash

hive -hiveconf enter_school_date="20130902" -hiveconf min_ag="26" -f testvar.sql

被呼叫的testvar.sql檔案內容：

use test; 

select * from student
where 
    pdate='${hiveconf:enter_school_date}' 
    and
    sage > '${hiveconf:min_ag}'
limit 8;

執行過程：

[[email protected] testHivePara]$ sh -x shellhive.sh 
+ hive -hiveconf enter_school_date=20130902 -hiveconf min_ag=26 -f testvar.sql
Hive history file=/tmp/czt/hive_job_log_czt_201309131651_2035045625.txt
OK
Time taken: 2.143 seconds
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Kill Command = hadoop job -kill job_20130911213659_42303
2013-09-13 16:52:00,300 Stage-1 map = 0%,  reduce = 0%
2013-09-13 16:52:14,609 Stage-1 map = 28%,  reduce = 0%
2013-09-13 16:52:24,642 Stage-1 map = 71%,  reduce = 0%
2013-09-13 16:52:34,639 Stage-1 map = 98%,  reduce = 0%
Ended Job = job_20130911213659_42303
OK
7       name7   27      20130902
8       name8   28      20130902
Time taken: 54.268 seconds

總結

本文主要闡述了兩種在hive中使用變數的方法，第一種是在shell中定義變數然後在hive -e的SQL語句中直接用${var_name}的方法呼叫；第二種是使用hive –hiveconf key=value –f run.sql模式使用-hiveconf來設定變數，然後在SQL檔案中使用${hiveconf:varname}的方法呼叫。用這兩種方法可以滿足開發的時候向hive傳遞引數的需求，會很好的提升開發效率和程式碼質量。

Hive：開發中使用變數的兩種方法

方法1：shell中設定變數，hive -e中直接使用

方法2：使用-hiveconf定義，在SQL檔案中使用

Hive：開發中使用變數的兩種方法

軟件開發中的兩種人：實用主義和發燒友

Qt外掛開發入門（兩種方法：High-Level API介面，Low-Level API介面）

1.建立一個Rectangle類，新增width和height兩個成員變數。 2.在Rectangle中新增兩種方法分別計算矩形的周長和麵積 3.程式設計利用Rectangle輸出一個矩形的周

.建立一個Rectangle類，新增width和height兩個成員變數。 2.在Rectangle中新增兩種方法分別計算矩形的周長和麵積 3.程式設計利用Rectangle輸出一個矩形的周

1.建立一個Rectangle類，新增width和height兩個成員變數。 2.在Rectangle中新增兩種方法分別計算矩形的周長和麵積 3.程式設計利用Rectangle輸出一個矩形的周

微信公眾號支付介面（vue專案中，兩種方法）

如何將c程式執行檔案打包入APK中，兩種方法

Qt分析：Qt中的兩種定時器

Android Studio新增NDK開發能力的兩種方法

從資料庫中查詢資料並顯示到datagridview中的兩種方法

DSP TMS320FF28335程式從FLASH中拷貝到RAM中的兩種方法及FLASH燒寫方法

selelium中使用兩種方法使得開啟瀏覽器不載入圖片

轉載：在ASP.net 3.5中用JSON序列化對象（兩種方法）

C語言中求最大公約數的兩種方法：輾轉相除法和更相減損術

機器學習實戰系列：sklearn 中模型儲存的兩種方法

JS刪除JSON陣列中的元素的兩種方法：delete和splice

【Qt開發】Qt中顯示影象的兩種方法對比

每日一python（3）：python 中對list去重的兩種方法

【Android開發技巧】Fragment中獲取Context物件的兩種方法

Hive：開發中使用變數的兩種方法

方法1：shell中設定變數，hive -e中直接使用

方法2：使用-hiveconf定義，在SQL檔案中使用

相關推薦