1. 程式人生 > >Hive常用命令

Hive常用命令

檢視hdfs路徑show create table table_name建表:create table tbname(var1 char_type1,var2 char_type2……)
載入資料到表:檢視欄位:desc tbname刪除表:drop table tbnameif(expr1,expr2,expr3)expr1:判斷條件是否成立,如果為True則expr2,如果False則expr3eg:if(isnull(id),0,id) as id:如果id不為null則id=id,若id為null,則id=0coalesce(expr1,expr2,expr3….)返回表示式中第一個非空表示式eg:coalesce(NULL,NULL,0)返回0case函式
case when … then …  when … then … else … endeg:select id,name,(case when score<4.0 then '0' when score >4.0 and score <9.0 then '1' else '2' end) as score from use_data;eg:select term_id,(case when term_id='2' or term_id='3' or term_id='4' then cast(term_id as int)-1 when term_id='1' then '4' else '0' end) as term_ids from mds_course_details_csrt limit 10;lateral view
Lateral view 通常和split, explode等udtf一起封裝使用,它能夠將一行資料拆分成多行資料,在此基礎上可以對拆分後的資料進行聚合。lateral view 首先為原始表的每行呼叫udtf, udtf會把一行拆分成一行或者多行,lateral vew再把結果聚合,產生一個支援別名表的虛擬表。eg:select id,namePart,score from use_data LATERAL VIEW explode(split(name,'i'))tmp as namePart;元資料錶行:2    lisi    9.0拆分成如下三行2    l    9.02    s    9.02        9.0concat(str,str,...)
連線字串:select concat('11','22','33’);  112233concat_ws(separator,str,str,...)自定義分隔符連線字串:select concat_ws(',','11','22','33');    11,22,33group by(排序) & collect_set(去重&形成集合)eg:select collect_set(id),collect_set(name),score from use_data group by score;[3,8,10,12]    ["wangwu","zhuliu"]    3.8[1,4,5,6]    ["zhangsan"]    8.9[2,7,9,11]    ["lisi"]    9.0substr('目標字串',開始位置,長度)函式的用法,取得字串中指定起始位置和長度的字串  ,預設是從起始位置到結束的子串
hive (default)> select substr('abcde',2,3);
OK
bcd
Time taken: 0.287 seconds, Fetched: 1 row(s)
data_add('標準日期’,時間間隔)eg:hive> select date_add('2018-03-20',-7);    2018-03-13rank:排序分組,預設從小到大,引數desc從大到小row_number:排序分組,預設從小到大,引數desc從大到小區別:rank值相等時候序號並列,row_number不會select id,name,score,rank()over(partition by name order by id desc)rank from use_data;以name欄位進行分組,分組內部以id進行排序select distinct:相當於去重,返回唯一關鍵字判斷欄位是否為空並賦值if(isnull(recruited_num)=1,0,recruited_num) as recruited_num隨機抽樣SELECT * FROM use_data TABLESAMPLE(5 ROWS)命令列執行hive -e "set hive.cli.print.header=true;“ >/data1/stu_subject_count/data/filetxtcast型別轉換科學計數法轉換cast(sum(online_time) as bigint)字串轉為doublecast(string as double)匯出hive hadoop日誌yarn logs -applicationId application_1524544136087_714973增加欄位alter table detail_flow_test add columns(original_union_id string)udf: add file +file_pathadd file /data1/lvyunhe/test.py排序distribute by 保證相同欄位在同一個reduce但是不能保證相鄰sort by 同一個reduce進行排序order by 全域性排序設定reduce個數set mapred.reduce.tasks = 15;轉換換成時間戳
hive (default)> select unix_timestamp('2018-07-11 15:40','yyyy-MM-dd HH:mm');
OK
1531294800
Time taken: 0.28 seconds, Fetched: 1 row(s)
hive (default)> select unix_timestamp('20180711-15:40','yyyyMMdd-HH:mm');
OK
1531294800
Time taken: 0.424 seconds, Fetched: 1 row(s)
instr判斷字串包含關係,返回第一次出現的位置
hive (default)> select instr('abcd','e');
OK
0
Time taken: 0.261 seconds, Fetched: 1 row(s)
hive (default)> select instr('abcd','c');
OK
3
Time taken: 0.239 seconds, Fetched: 1 row(s)

length和size:求字串長度和字元個數

hive (default)> select length('2018,2017,2016'),size(split('2018,2017,2016',',')),split('2018,2017,2016',',');
OK
14	3	["2018","2017","2016"]
Time taken: 0.281 seconds, Fetched: 1 row(s)