1. 程式人生 > >PostgreSql聚合函式二---聚合函式,分析函式和視窗函式

PostgreSql聚合函式二---聚合函式,分析函式和視窗函式

PostgreSql的視窗函式使用
文件中涉及的表的結構和資料:
1.表emp_detail:
create table emp_detail(
 empno integer,
 ename varchar(10),
 sal numeric,
 dept_no integer,
 time_stamp date
 );
 
 insert into emp_detail values(7369,'SMITH',100);
 
 insert into emp_detail values(7369,'SMITH',100,20,'2015-04-01');
 insert into emp_detail values(7369,'SMITH',105,20,'2015-04-02');
 insert into emp_detail values(7369,'SMITH',120,20,'2015-04-03');
 insert into emp_detail values(7369,'SMITH',150,20,'2015-04-04');
 insert into emp_detail values(7369,'SMITH',200,20,'2015-04-05');
 insert into emp_detail values(7369,'SMITH',400,20,'2015-04-06');
 insert into emp_detail values(7369,'SMITH',180,20,'2015-04-07');


2.表Student:
create table student(
  id int,
  stu_name varchar(50),
  chinese numeric,
  english numeric,
  math    numeric
  );
  
  insert into student values(1001,'小明',80,75,90);
  insert into student values(1002,'小紅',70,75,85);
  insert into student values(1003,'小強',80,90,100);


3.表emp:
CREATE TABLE public.emp (
  empno INTEGER,
  ename VARCHAR(10),
  job VARCHAR(9),
  mgr INTEGER,
  hiredate TIMESTAMP(6) WITHOUT TIME ZONE,
  sal DOUBLE PRECISION,
  comm DOUBLE PRECISION,
  dept_no INTEGER
);




INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7369, E'SMITH', E'CLERK', 7902, E'1980-12-17 00:00:00', 800, NULL, 20);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7499, E'ALLEN', E'SALESMAN', 7698, E'1981-02-20 00:00:00', 1600, 306, 30);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7521, E'WARD', E'SALESMAN', 7698, E'1981-02-22 00:00:00', 1250, 506, 30);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7566, E'JONES', E'MANAGER', 7839, E'1981-04-02 00:00:00', 2975, NULL, 20);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7654, E'MARTIN', E'SALESMAN', 7698, E'1981-09-28 00:00:00', 1250, 1406, 30);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7698, E'BLAKE', E'MANAGER', 7839, E'1981-05-01 00:00:00', 2850, NULL, 30);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7782, E'CLARK', E'MANAGER', 7839, E'1981-06-09 00:00:00', 2450, NULL, 10);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7788, E'SCOTT', E'ANALYST', 7566, E'1987-04-19 00:00:00', 3000, NULL, 20);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7839, E'KING', E'PRESIDENT', NULL, E'1981-11-17 00:00:00', 5000, NULL, 10);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7844, E'TURNER', E'SALESMAN', 7698, E'1981-09-08 00:00:00', 1500, 6, 30);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7876, E'ADAMS', E'CLERK', 7788, E'1987-05-23 00:00:00', 1100, NULL, 20);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7900, E'JAMES', E'CLERK', 7698, E'1981-12-03 00:00:00', 950, NULL, 30);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7902, E'FORD', E'ANALYST', 7566, E'1981-12-03 00:00:00', 3000, NULL, 20);


INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7934, E'MILLER', E'CLERK', 7782, E'1982-01-23 00:00:00', 1300, NULL, 10);
一、視窗函式的語法
視窗函式的基本結構:
function_name ([expression [, expression ... ]]) OVER window_name
function_name ([expression [, expression ... ]]) OVER ( window_definition )
function_name ( * ) OVER window_name
function_name ( * ) OVER ( window_definition )
window_definition 的定義: 
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]
配置項frame_clause的選擇: 
[ RANGE | ROWS ] frame_start
[ RANGE | ROWS ] BETWEEN frame_start AND frame_end
視窗起始項 frame_start 和結束項 frame_end的選擇:
UNBOUNDED PRECEDING  
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING


這裡,
expression代表著任何的值表示式,但是不包含自身的視窗函式呼叫。
Window_name代表一個視窗名稱,一個完整的視窗是用WINDOW關鍵字命名並且定義的,也可以使匿名的。
PARTITION BY 與group by是類似的都是用來分組,但是與group by不同的是PARTITION BY中就只是一個表示式,不像groupby其除了分組欄位以外不能再select中顯示,除非是聚合函式。如果沒有PARTITION BY,那麼每一行就是一個分組一個視窗。
ORDER BY作用是使PARTITION BY分組中的排序方式,支援與通常的排序是一致的可以有ASC,DESC,nulls first or last等等。如果沒有指定order by那麼就是沒有指定排序方式。
frame_clause就是指視窗的大小,視窗如何移動等。
frame_start和frame_end就是來確定視窗大小的兩個引數或者是視窗的邊界。
    frame_start不能使用unbounded following,同樣frame_end也不能使用unbounded preceding
UNBOUNDED PRECEDING 的意思是從第一行開始,但是隻能用在frame_start的位置。
UNBOUNDED FOLLOWING 的意思更好與UNBOUNDED PRECEDING相反指的是到最後一行,其也只能用在frame_end的位置。
value PRECEDING和 value FOLLOWING都只能使用咋rows模式中不能使用在range模式中,value的值是一個整型的數值也可以使整型表示式,不能為變數,聚合函式,視窗函式等,value不能為空或者是不明確的,但是可以為0,為0的時候表示的就是當前行。
value PRECEDING是指從哪一行開始,value FOLLOWING指的是從哪一行結束。
Value = 1 時 value PRECEDING 指的是當前行的前一行開始,value FOLLOWING則為當前行的前一行為止。隨著表中資料的掃描視窗會以這個尺寸一直走下去,執行相關的分析函式。


二、視窗函式示例:
1.從員工表(emp)中查詢每個員工的資訊,並且查詢整個公司的工資總額。


select ename,sal,
sum(sal)over(order by empno range between unbounded preceding and unbounded following)
 from emp;




2.從員工明細表中查詢一個員工在前後三天所得的工資總和。


select empno,
        ename,
        sal,
        dept_no,
        sum(sal) over(
 order by empno, time_stamp rows between 1 preceding and 1 following)
 from emp_detail;










3.從員工表(emp)中查詢每個員工的資訊,並且查詢每個部門的工資總和。


select ename,
        sal,
        sum(sal) over(partition by dept_no
 order by empno range between unbounded preceding and unbounded following)
 from emp;


三、分析函式的介紹
row_number():在一個結果集中,返回當前的行的號碼。
rank()、dense_rank():在一個結果集中,用來排名,前者是完全差異後者是不完全差異,簡言之前者是按阿拉伯數字順序來,後者則會跳躍。
lag(value any)、lead(value any):用來對當前行對於指定的欄位與下一行或者前一行的值進行比較。
first_value(value any)、last_value(value any):在一個視窗中,返回指定排序的第一個值和最後一個值。
其他類似與sum(),agv(),max(),min()也都是能與視窗函式配合使用,當做分析函式。


四、分析函式與視窗函式的混用示例
1.從員工表(emp)中按照員工被僱傭的時間大小,查詢入職時間的先後順序。
Select row_number() over(
 order by hiredate asc),
          ename,
          empno,
          hiredate
 from emp;
2.從員工表中查詢每個部門的工資排名,並且給工資最高的人加10%的獎金。
    update emp
 set comm = comm + sal * 0.01
 where empno in (
                  select *
                  from (
                         select ename,
                                empno,
                                sal,
                                dept_no,
                                dense_rank() over(partition by dept_no
                         order by sal desc) as level_
                         from emp
                       ) t
                  where level_ = 1
       );
3.從員工表中查詢每個部門的工資排名,並且給每個部門中工資排名在第三名的員工加20%的獎金。
update emp
 set comm = comm + sal * 0.02
 where empno in (
                  select *
                  from (
                         select ename,
                                empno,
                                sal,
                                dept_no,
                                rank() over(partition by dept_no
                         order by sal desc) as level_
                         from emp
                       ) t
                  where level_ = 1
       );
4.從員工表中查詢每個部門的員工的工資從大到小排序,並且計算前後兩名的工資差值。
select ename,
 empno,
 sal,
 dept_no,
 lag(sal)over(partition by dept_no order by sal) as lag_end,
 sal - lag(sal)over(partition by dept_no order by sal)
  from emp order by dept_no,sal asc;
  查詢結果中存在null中,原因是每個分組的第一行沒有前一行一次為空值。
  那麼假定第一行的值需要與最後一行來比較,那麼應該怎麼做:
   select ename,
 empno,
 sal,
 dept_no,
 CASE when lag(sal)over(partition by dept_no order by sal) is null then max(sal)OVER(partition by dept_no order by sal desc)
 else lag(sal)over(partition by dept_no order by sal)
 end lag_end
  from emp order by dept_no,sal asc;
也可以是這樣:
select ename,
 empno,
 sal,
 dept_no,
 CASE when lag(sal)over(partition by dept_no order by sal) is null then first_value(sal)OVER(partition by dept_no order by sal desc)
 else lag(sal)over(partition by dept_no order by sal)
 end lag_end
  from emp order by dept_no,sal asc;
5.查詢學生表中每一個學生按照科目的分數排序。
  select id,stu_name,course,point_,first_value(point_)over(partition by id order by point_ desc ) from (
  with temp as (
  select id,stu_name,chinese,english,math from student
  ),t1 as (select id,stu_name,chinese as point_, '語文'::text as course from temp
  ),t2 as (select id,stu_name,english as point_, '英語'::text as course from temp
  ),t3 as (select id,stu_name,math as point_, '數學'::text as course from temp)
  select * from t1 
  union all (select * from t2)
  union all (select * from t3)