1. 程式人生 > >Hive分析窗體函數之SUM,AVG,MIN和MAX

Hive分析窗體函數之SUM,AVG,MIN和MAX

align 4.5 版本 bottom pos right adding track mat

Hive中提供了非常多的分析函數,用於完畢負責的統計分析。

本文先介紹SUMAVGMINMAX這四個函數


環境信息:

Hive版本號為apache-hive-0.14.0-bin

Hadoop版本號為hadoop-2.6.0

Tez版本號為tez-0.7.0


構造數據:

P088888888888,2016-02-10,1

P088888888888,2016-02-11,3

P088888888888,2016-02-12,1

P088888888888,2016-02-13,9

P088888888888,2016-02-14,3

P088888888888,2016-02-15,12

P088888888888,2016-02-16,3

創建表:

hive (hiveinaction)> create table windows_func

>(

> polno string,

> createtime string,

> pnum int

>)

>ROW FORMAT DELIMITED

>FIELDS TERMINATED BY ‘,‘

>stored as textfile;

導入數據到表中:

load data local inpath ‘/home/hadoop/testhivedata/windows_func.txt‘ into table windows_func;

測試:

SELECT polno,

createtime,

pnum,

SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime) AS pnum1, --默覺得從起點到當前行

SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pnum2, --

從起點到當前行

SUM(pnum) OVER(PARTITION BY polno) ASpnum3, --分組內全部行

SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pnum4, --當前行+往前3(當前行的值+前面三行的值)

SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pnum5, --當前行+往前3+往後1

SUM(pnum) OVER(PARTITION BY polno ORDERBY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pnum6 ---當前行+往後全部行

FROM windows_func;

結果:

polno

createtime

pnum

pnum1

pnum2

pnum3

pnum4

pnum5

pnum6

P088888888888

2016/2/10

1

1

1

32

1

4

32

P088888888888

2016/2/11

3

4

4

32

4

5

31

P088888888888

2016/2/12

1

5

5

32

5

14

28

P088888888888

2016/2/13

9

14

14

32

14

17

27

P088888888888

2016/2/14

3

17

17

32

16

28

18

P088888888888

2016/2/15

12

29

29

32

25

28

15

P088888888888

2016/2/16

3

32

32

32

27

27

3

凝視:

1. 假設不指定ROWS BETWEEN,默覺得從起點到當前行;

2. 假設不指定ORDER BY,則將分組內全部值累加;

理解ROWS BETWEEN含義,也叫做WINDOW子句:
PRECEDING
:往前
FOLLOWING
:往後
CURRENT ROW
:當前行
UNBOUNDED
:起點,UNBOUNDED PRECEDING表示從前面的起點, UNBOUNDED FOLLOWING:表示到後面的終點
其它AVGMINMAX。和SUM使用方法一樣。

演示AVG環境:

SELECT polno,

createtime,

pnum,

AVG(pnum) OVER(PARTITION BY polno ORDER BY createtime) AS pnum1, --默覺得從起點到當前行

AVG(pnum) OVER(PARTITION BY polno ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pnum2, --從起點到當前行

AVG(pnum) OVER(PARTITION BY polno) AS pnum3, --分組內全部行

AVG(pnum) OVER(PARTITION BY polno ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pnum4, --當前行+往前3(當前行的值+前面三行的值)

AVG(pnum) OVER(PARTITION BYpolno ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pnum5, --當前行+往前3+往後1

AVG(pnum) OVER(PARTITION BYpolno ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pnum6 ---當前行+往後全部行

FROM windows_func;

結果:

polno

createtime

pnum

pnum1

pnum2

pnum3

pnum4

pnum5

pnum6

P088888888888

2016/2/10

1

1

1

4.57142857

1

2

4.5714286

P088888888888

2016/2/11

3

2

2

4.57142857

2

1.666667

5.1666667

P088888888888

2016/2/12

1

1.66667

1.6667

4.57142857

1.666667

3.5

5.6

P088888888888

2016/2/13

9

3.5

3.5

4.57142857

3.5

3.4

6.75

P088888888888

2016/2/14

3

3.4

3.4

4.57142857

4

5.6

6

P088888888888

2016/2/15

12

4.83333

4.8333

4.57142857

6.25

5.6

7.5

P088888888888

2016/2/16

3

4.57143

4.5714

4.57142857

6.75

6.75

3

其它相似的函數就不舉例了。

Hive分析窗體函數之SUM,AVG,MIN和MAX