1. 程式人生 > >Hive 中對 json 處理

Hive 中對 json 處理

應用場景:使用Hive對日誌資訊進行查詢分解,log裡面記錄的是json形式的資料:

{"logid":"5d40e1af-19f7-4aad-af8f-c7247e322e5c","souc":"4","devi":"OPPO R7sm","sys":"22,5.1.1","dname":"Dalvik/2.1.0 (Linux; U; 
Android 5.1.1; OPPO R7sm Build/LMY47V)","chan":"canary","vers":"638.1","mac":"dc:6d:cd:16:46:0f","imei":"869410022554506","ifa":"","city_id":"384",
"reso":"1800*1080","euid":"a341c57d6abcdd969b6dc4c7a564a15f","dpi":"3.0","host":"192.168.1.212:80","ip":"223.100.138.62","uid":"9587355",
"qtype":"view_caipu_detail","qtype_sub":"","obj":"861920","uri":"recipe/detail","qnum":"1","refer":"","lat":"39.68861","lon":"122.978162",
"srctype":"2800","agentid":"2066b417879f467164fd1fb91b9d04c0",
"ext":"{\"query\":{\"kw\":\"培根披薩\",\"src\":\"2801\",\"idx\":\"3\",\"type\":\"13\",\"id\":\"861920\"}}","page_num":"","android_id":"","pseudo_id":""}

在這部分日誌裡面,對於:

"ext":"{\"query\":{\"kw\":\"培根披薩\",\"src\":\"2801\",\"idx\":\"3\",\"type\":\"13\",\"id\":\"861920\"}}","

部分,是json裡面又包含了2重json。

接下來,我們使用Hive來取得kw、src、idx、type等資訊:

寫法一:

select tmp.kw,tmp.src,tmp.idx,count(*) as num 
from (
	select get_json_object(tt.query,'$.kw') as kw,get_json_object(tt.query,'$.src') as src,get_json_object(tt.query,'$.idx') as idx 
	from (
		select get_json_object(t.ext,'$.query') as query 
		from (
			select req["ext"] as ext
			from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' 
		) t 
	) tt 
) tmp group by tmp.kw,tmp.src,tmp.idx 

這種寫法使用了Hive內建函式get_json_object,而且使用了多層次來分解,可以實現對json的分解。

寫法二:

select json_tuple(get_json_object(req["ext"],"$.query"),"kw","id","idx","type","src") 
from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' 
limit 20

這種寫法使用了json_tuple函式配合get_json_object來取得json裡的資料,取得了''query'':'' ''這第二層json的資料,但是如果對於"kw","id","idx","type","src"這個幾個欄位只需要其中幾個的話,則需要使用lateral view來處理了:

select nt.a,count(nt.a) as num from (
select req from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' limit 10 
) dsl lateral view json_tuple(get_json_object(req["ext"],"$.query"),"kw","id","idx","type","src") nt as a,b,c,d,e 

轉載:https://blog.csdn.net/qq_31573519/article/details/55104822