1. 程式人生 > >hive視窗函式之sum,avg,min,max

hive視窗函式之sum,avg,min,max

在hive的統計分析中,其實視窗函式還是比較常用也重要的。
今天整理下hive中視窗函式的sum,avg,min,max,後續再整理其他常用的。

首先模擬建立一張通話記錄表:欄位有主叫號碼,主叫時間,通話時長

> create table `call_test` (
    `pone_number` string,
    `createtime` string,   --day 
    `call_minute` int
    );
OK
Time taken: 0.369 seconds

查看下錶結構

> desc call_test;
OK
pone_number         	string              	                    
createtime          	string              	                    
call_minute         	int
Time taken: 0.149 seconds, Fetched: 3 row(s)

插入模擬資料

insert into call_test values('18600000000', '2018-12-10 13:00:00', 1);
insert into call_test values('18600000000', '2018-12-11 13:00:00', 6);
insert into call_test values('18600000000', '2018-12-12 13:00:00', 8);
insert into call_test values('18600000000'
, '2018-12-13 13:00:00', 4); insert into call_test values('18600000000', '2018-12-14 13:00:00', 7); insert into call_test values('18600000000', '2018-12-15 13:00:00', 1); insert into call_test values('18600000000', '2018-12-16 13:00:00', 6); insert into call_test values('18600000000', '2018-12-17 13:00:00', 8); insert into call_test values('18600000000', '2018-12-18 13:00:00'
, 2); insert into call_test values('18600000000', '2018-12-19 13:00:00', 4); insert into call_test values('18600000000', '2018-12-20 13:00:00', 7); insert into call_test values('18600000000', '2018-12-21 13:00:00', 1); insert into call_test values('18600000000', '2018-12-22 13:00:00', 6); insert into call_test values('18600000000', '2018-12-23 13:00:00', 8); insert into call_test values('15600000000', '2018-12-10 13:00:00', 2); insert into call_test values('15600000000', '2018-12-11 13:00:00', 4); insert into call_test values('15600000000', '2018-12-12 13:00:00', 7); insert into call_test values('15600000000', '2018-12-13 13:00:00', 1); insert into call_test values('15600000000', '2018-12-14 13:00:00', 6); insert into call_test values('15600000000', '2018-12-15 13:00:00', 8); insert into call_test values('15600000000', '2018-12-16 13:00:00', 2); insert into call_test values('15600000000', '2018-12-17 13:00:00', 4); insert into call_test values('15600000000', '2018-12-18 13:00:00', 7);

SUM — 注意,結果和ORDER BY相關,預設為升序

> select pone_number,
createtime,
call_minute,
sum(call_minute) OVER(partition by pone_number order by createtime) as call_minute1, -- 預設為從起點到當前行
sum(call_minute) OVER(partition by pone_number order by createtime rows between unbounded preceding and current row) as call_minute2, --從起點到當前行,結果同call_minute1 
sum(call_minute) OVER(partition by pone_number) as call_minute3,--分組內所有行
sum(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and current row) as call_minute4,   --當前行+往前3行
sum(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and 1 following) as call_minute5,    --當前行+往前3行+往後1行
sum(call_minute) OVER(partition by pone_number order by createtime rows between current row and unbounded following) as call_minute6   ---當前行+往後所有行  
FROM call_test;
Query ID = hdfs_20181211000153_8870b5b2-ecaf-46aa-90f2-49a73e9e4ddf
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1541064601030_38864)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.66 s     
----------------------------------------------------------------------------------------------
OK
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| pone_number  |      createtime      | call_minute  | call_minute1  | call_minute2  | call_minute3  | call_minute4  | call_minute5  | call_minute6  |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| 15600000000  | 2018-12-14 13:00:00  | 6            | 20            | 20            | 41            | 18            | 26            | 27            |
| 15600000000  | 2018-12-13 13:00:00  | 1            | 14            | 14            | 41            | 14            | 20            | 28            |
| 15600000000  | 2018-12-12 13:00:00  | 7            | 13            | 13            | 41            | 13            | 14            | 35            |
| 15600000000  | 2018-12-11 13:00:00  | 4            | 6             | 6             | 41            | 6             | 13            | 39            |
| 15600000000  | 2018-12-10 13:00:00  | 2            | 2             | 2             | 41            | 2             | 6             | 41            |
| 15600000000  | 2018-12-18 13:00:00  | 7            | 41            | 41            | 41            | 21            | 21            | 7             |
| 15600000000  | 2018-12-17 13:00:00  | 4            | 34            | 34            | 41            | 20            | 27            | 11            |
| 15600000000  | 2018-12-16 13:00:00  | 2            | 30            | 30            | 41            | 17            | 21            | 13            |
| 15600000000  | 2018-12-15 13:00:00  | 8            | 28            | 28            | 41            | 22            | 24            | 21            |
| 18600000000  | 2018-12-23 13:00:00  | 8            | 69            | 69            | 69            | 22            | 22            | 8             |
| 18600000000  | 2018-12-22 13:00:00  | 6            | 61            | 61            | 69            | 18            | 26            | 14            |
| 18600000000  | 2018-12-21 13:00:00  | 1            | 55            | 55            | 69            | 14            | 20            | 15            |
| 18600000000  | 2018-12-20 13:00:00  | 7            | 54            | 54            | 69            | 21            | 22            | 22            |
| 18600000000  | 2018-12-19 13:00:00  | 4            | 47            | 47            | 69            | 20            | 27            | 26            |
| 18600000000  | 2018-12-18 13:00:00  | 2            | 43            | 43            | 69            | 17            | 21            | 28            |
| 18600000000  | 2018-12-17 13:00:00  | 8            | 41            | 41            | 69            | 22            | 24            | 36            |
| 18600000000  | 2018-12-16 13:00:00  | 6            | 33            | 33            | 69            | 18            | 26            | 42            |
| 18600000000  | 2018-12-15 13:00:00  | 1            | 27            | 27            | 69            | 20            | 26            | 43            |
| 18600000000  | 2018-12-14 13:00:00  | 7            | 26            | 26            | 69            | 25            | 26            | 50            |
| 18600000000  | 2018-12-13 13:00:00  | 4            | 19            | 19            | 69            | 19            | 26            | 54            |
| 18600000000  | 2018-12-11 13:00:00  | 6            | 7             | 7             | 69            | 7             | 15            | 68            |
| 18600000000  | 2018-12-10 13:00:00  | 1            | 1             | 1             | 69            | 1             | 7             | 69            |
| 18600000000  | 2018-12-12 13:00:00  | 8            | 15            | 15            | 69            | 15            | 19            | 62            |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
Time taken: 1.14 seconds, Fetched: 23 row(s)

解釋:
call_minute1: 分組內從起點到當前行的call_minute累積,如,11號的call_minute1=10號的call_minute+11號的call_minute, 12號=10號+11號+12號
call_minute2: 同call_minute1
call_minute3: 分組內(call_minute1)所有的call_minute累加
call_minute4: 分組內當前行+往前3行,如,11號=10號+11號, 12號=10號+11號+12號, 13號=10號+11號+12號+13號, 14號=11號+12號+13號+14號
call_minute5: 分組內當前行+往前3行+往後1行,如,14號=11號+12號+13號+14號+15號
call_minute6: 分組內當前行+往後所有行,如,13號=13號+14號+15號+16號,14號=14號+15號+16號

如果不指定rows between,預設為從起點到當前行;
如果不指定order by,則將分組內所有值累加;
關鍵是理解rows between含義,也叫做window子句:
preceding:往前
following:往後
current row:當前行
unbounded:起點,unbounded preceding 表示從前面的起點, unbounded following:表示到後面的終點

其他avg,min,max,和sum用法一樣。

AVG

> select pone_number,
createtime,
call_minute,
round(avg(call_minute) OVER(partition by pone_number order by createtime), 2) as call_minute1, -- 預設為從起點到當前行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between unbounded preceding and current row), 2) as call_minute2, --從起點到當前行,結果同call_minute1 
round(avg(call_minute) OVER(partition by pone_number), 2) as call_minute3,--分組內所有行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and current row), 2) as call_minute4,   --當前行+往前3行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between 3 preceding and 1 following), 2) as call_minute5,    --當前行+往前3行+往後1行
round(avg(call_minute) OVER(partition by pone_number order by createtime rows between current row and unbounded following), 2) as call_minute6   ---當前行+往後所有行  
FROM call_test; 
Query ID = hdfs_20181211000203_53ab6fb6-628c-4ac8-81aa-244c73b701f0
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1541064601030_38864)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 4.04 s     
----------------------------------------------------------------------------------------------
OK
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| pone_number  |      createtime      | call_minute  | call_minute1  | call_minute2  | call_minute3  | call_minute4  | call_minute5  | call_minute6  |
+--------------+----------------------+--------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
| 15600000000  | 2018-12-14 13:00:00  | 6            | 4.0           | 4.0           | 4.56          | 4.5           | 5.2           | 5.4           |
| 15600000000  | 2018-12-13 13:00:00  | 1            | 3.5           | 3.5           | 4.56          | 3.5           | 4.0           | 4.67          |
| 15600000000  | 2018-12-12 13:00:00  | 7            | 4.33          | 4.33          | 4.56          | 4.33          | 3.5           | 5.0           |
| 15600000000  | 2018-12-11 13:00:00  | 4