1. 程式人生 > >Hive分析窗口函數(一) SUM,AVG,MIN,MAX

Hive分析窗口函數(一) SUM,AVG,MIN,MAX

nal text 3.3 如果 htm 相關 com 註意 分析

Hive分析窗口函數(一) SUM,AVG,MIN,MAX

Hive分析窗口函數(一) SUM,AVG,MIN,MAX

Hive中提供了越來越多的分析函數,用於完成負責的統計分析。抽時間將所有的分析窗口函數理一遍,將陸續發布。

今天先看幾個基礎的,SUM、AVG、MIN、MAX。

用於實現分組內所有和連續累積的統計。

數據準備

  1. CREATE EXTERNAL TABLE lxw1234 (
  2. cookieid string,
  3. createtime string, --day
  4. pv INT
  5. ) ROW FORMAT DELIMITED
  6. FIELDS TERMINATED BY ‘,‘
  7. stored as textfile location ‘/tmp/lxw11/‘;
  8. DESC lxw1234;
  9. cookieid STRING
  10. createtime STRING
  11. pv INT
  12. hive> select * from lxw1234;
  13. OK
  14. cookie1 2015-04-10 1
  15. cookie1 2015-04-11 5
  16. cookie1 2015-04-12 7
  17. cookie1 2015-04-13 3
  18. cookie1 2015-04-14 2
  19. cookie1 2015-04-15 4
  20. cookie1 2015-04-16 4

SUM — 註意,結果和ORDER BY相關,默認為升序

  1. SELECT cookieid,
  2. createtime,
  3. pv,
  4. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默認為從起點到當前行
  5. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --從起點到當前行,結果同pv1
  6. SUM(pv) OVER(PARTITION BY cookieid) AS pv3, --分組內所有行
  7. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --當前行+往前3
  8. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --當前行+往前3行+往後1
  9. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---當前行+往後所有行
  10. FROM lxw1234;
  11. cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
  12. -----------------------------------------------------------------------------
  13. cookie1 2015-04-10 1 1 1 26 1 6 26
  14. cookie1 2015-04-11 5 6 6 26 6 13 25
  15. cookie1 2015-04-12 7 13 13 26 13 16 20
  16. cookie1 2015-04-13 3 16 16 26 16 18 13
  17. cookie1 2015-04-14 2 18 18 26 17 21 10
  18. cookie1 2015-04-15 4 22 22 26 16 20 8
  19. cookie1 2015-04-16 4 26 26 26 13 13 4

pv1: 分組內從起點到當前行的pv累積,如,11號的pv1=10號的pv+11號的pv, 12號=10號+11號+12號
pv2: 同pv1
pv3: 分組內(cookie1)所有的pv累加
pv4: 分組內當前行+往前3行,如,11號=10號+11號, 12號=10號+11號+12號, 13號=10號+11號+12號+13號, 14號=11號+12號+13號+14號
pv5: 分組內當前行+往前3行+往後1行,如,14號=11號+12號+13號+14號+15號=5+7+3+2+4=21
pv6: 分組內當前行+往後所有行,如,13號=13號+14號+15號+16號=3+2+4+4=13,14號=14號+15號+16號=2+4+4=10

如果不指定ROWS BETWEEN,默認為從起點到當前行;
如果不指定ORDER BY,則將分組內所有值累加;
關鍵是理解ROWS BETWEEN含義,也叫做WINDOW子句
PRECEDING:往前
FOLLOWING:往後
CURRENT ROW:當前行
UNBOUNDED:起點,UNBOUNDED PRECEDING 表示從前面的起點, UNBOUNDED FOLLOWING:表示到後面的終點

–其他AVG,MIN,MAX,和SUM用法一樣。

  1. --AVG
  2. SELECT cookieid,
  3. createtime,
  4. pv,
  5. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默認為從起點到當前行
  6. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --從起點到當前行,結果同pv1
  7. AVG(pv) OVER(PARTITION BY cookieid) AS pv3, --分組內所有行
  8. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --當前行+往前3
  9. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --當前行+往前3行+往後1
  10. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---當前行+往後所有行
  11. FROM lxw1234;
  12. cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
  13. -----------------------------------------------------------------------------
  14. cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144
  15. cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667
  16. cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0
  17. cookie1 2015-04-13 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25
  18. cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335
  19. cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0
  20. cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0
  1. --MIN
  2. SELECT cookieid,
  3. createtime,
  4. pv,
  5. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默認為從起點到當前行
  6. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --從起點到當前行,結果同pv1
  7. MIN(pv) OVER(PARTITION BY cookieid) AS pv3, --分組內所有行
  8. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --當前行+往前3
  9. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --當前行+往前3行+往後1
  10. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---當前行+往後所有行
  11. FROM lxw1234;
  12. cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
  13. -----------------------------------------------------------------------------
  14. cookie1 2015-04-10 1 1 1 1 1 1 1
  15. cookie1 2015-04-11 5 1 1 1 1 1 2
  16. cookie1 2015-04-12 7 1 1 1 1 1 2
  17. cookie1 2015-04-13 3 1 1 1 1 1 2
  18. cookie1 2015-04-14 2 1 1 1 2 2 2
  19. cookie1 2015-04-15 4 1 1 1 2 2 4
  20. cookie1 2015-04-16 4 1 1 1 2 2 4
  1. ----MAX
  2. SELECT cookieid,
  3. createtime,
  4. pv,
  5. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默認為從起點到當前行
  6. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --從起點到當前行,結果同pv1
  7. MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分組內所有行
  8. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --當前行+往前3
  9. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --當前行+往前3行+往後1
  10. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---當前行+往後所有行
  11. FROM lxw1234;
  12. cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
  13. -----------------------------------------------------------------------------
  14. cookie1 2015-04-10 1 1 1 7 1 5 7
  15. cookie1 2015-04-11 5 5 5 7 5 7 7
  16. cookie1 2015-04-12 7 7 7 7 7 7 7
  17. cookie1 2015-04-13 3 7 7 7 7 7 4
  18. cookie1 2015-04-14 2 7 7 7 7 7 4
  19. cookie1 2015-04-15 4 7 7 7 7 7 4
  20. cookie1 2015-04-16 4 7 7 7 4 4 4

其他函數的介紹將陸續整理發布。。

Hive分析窗口函數(一) SUM,AVG,MIN,MAX