1. 程式人生 > >[Hive_12] Hive 的自定義函數

[Hive_12] Hive 的自定義函數

hub clas array 個數 define share exp lis 標簽


0. 說明

  UDF   //user define function
      //輸入單行,輸出單行,類似於 format_number(age,‘000‘)

  UDTF   //user define table-gen function
       //輸入單行,輸出多行,類似於 explode(array);

  UDAF   //user define aggr function
       //輸入多行,輸出單行,類似於 sum(xxx)

  Hive 通過 UDF 實現對 temptags 的解析


1. UDF

  1.1 代碼示例

  Code

  1.2 用戶自定義函數的使用

  1. 將 Hive 自定義函數打包並發送到 /soft/hive/lib 下
  2. 重啟 Hive
  3. 註冊函數

# 永久函數
  create function myudf as com.share.udf.MyUDF‘;

# 臨時函數
  create temporary function myudf as com.share.udf.MyUDF‘;

  1.3 Demo

  Hive 通過 UDF 實現對 temptags 的解析

  0. 準備數據

  1. 建表

    create table temptags(id int,json string) row format delimited fields terminated by
\t;

  2. 加載數據

    load data local inpath /home/centos/files/temptags.txt into table temptags;

  3. 代碼編寫

  Code

  4. 打包

  5. 添加 fastjson-1.2.47.jar & myhive-1.0-SNAPSHOT.jar 到 /soft/hive/lib 中

  6. 重啟 Hive

  7. 註冊臨時函數

    create temporary function parsejson as com.share.udf.ParseJson‘;

  8. 測試

select id ,parsejson(json) as tags from temptags;

# 將 id 和 tag 炸開
select id,  tag from temptags lateral view explode(parsejson(json)) xx as tag;

# 開始統計每個商家每個標簽個數
select id, tag, count(*) as count
from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id, tag; # 進行商家內標簽數的排序 select id, tag , count, row_number()over(partition by id order by count desc) as rank
from (select id, tag, count(*) as count from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id,tag) b ; # 將標簽和個數進行拼串,取得前 10 標簽數 select id, concat(tag,_,count)
from (select id, tag , count, row_number()over(partition by id order by count desc) as rank
from (select id, tag, count(*) as count from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id,tag) b )c
where rank<=10; #聚合拼串 //concat_ws(,, List<>) //collect_set(name) 將所有字段變為數組,去重 //collect_list(name) 將所有字段變為數組,不去重 select id, concat_ws(,,collect_set(concat(tag,_,count))) as tags
from (select id, tag , count, row_number()over(partition by id order by count desc) as rank
from (select id, tag, count(*) as count from (select id, tag from temptags lateral view explode(parsejson(json)) xx as tag) a
group by id,tag) b )c where rank<=10 group by id;

  1.4 虛列:lateral view

  123456 味道好_10,環境衛生_9

  id   tags
  1   [味道好,環境衛生]   =>   1 味道好
                   1 環境衛生

select name, workplace from employee lateral view explode(work_place) xx as workplace;



[Hive_12] Hive 的自定義函數