PostgreSql聚合函式二---聚合函式,分析函式和視窗函式
阿新 • • 發佈:2019-01-07
PostgreSql的視窗函式使用
文件中涉及的表的結構和資料:
1.表emp_detail:
create table emp_detail(
empno integer,
ename varchar(10),
sal numeric,
dept_no integer,
time_stamp date
);
insert into emp_detail values(7369,'SMITH',100);
insert into emp_detail values(7369,'SMITH',100,20,'2015-04-01');
insert into emp_detail values(7369,'SMITH',105,20,'2015-04-02');
insert into emp_detail values(7369,'SMITH',120,20,'2015-04-03');
insert into emp_detail values(7369,'SMITH',150,20,'2015-04-04');
insert into emp_detail values(7369,'SMITH',200,20,'2015-04-05');
insert into emp_detail values(7369,'SMITH',400,20,'2015-04-06');
insert into emp_detail values(7369,'SMITH',180,20,'2015-04-07');
2.表Student:
create table student(
id int,
stu_name varchar(50),
chinese numeric,
english numeric,
math numeric
);
insert into student values(1001,'小明',80,75,90);
insert into student values(1002,'小紅',70,75,85);
insert into student values(1003,'小強',80,90,100);
3.表emp:
CREATE TABLE public.emp (
empno INTEGER,
ename VARCHAR(10),
job VARCHAR(9),
mgr INTEGER,
hiredate TIMESTAMP(6) WITHOUT TIME ZONE,
sal DOUBLE PRECISION,
comm DOUBLE PRECISION,
dept_no INTEGER
);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7369, E'SMITH', E'CLERK', 7902, E'1980-12-17 00:00:00', 800, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7499, E'ALLEN', E'SALESMAN', 7698, E'1981-02-20 00:00:00', 1600, 306, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7521, E'WARD', E'SALESMAN', 7698, E'1981-02-22 00:00:00', 1250, 506, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7566, E'JONES', E'MANAGER', 7839, E'1981-04-02 00:00:00', 2975, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7654, E'MARTIN', E'SALESMAN', 7698, E'1981-09-28 00:00:00', 1250, 1406, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7698, E'BLAKE', E'MANAGER', 7839, E'1981-05-01 00:00:00', 2850, NULL, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7782, E'CLARK', E'MANAGER', 7839, E'1981-06-09 00:00:00', 2450, NULL, 10);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7788, E'SCOTT', E'ANALYST', 7566, E'1987-04-19 00:00:00', 3000, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7839, E'KING', E'PRESIDENT', NULL, E'1981-11-17 00:00:00', 5000, NULL, 10);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7844, E'TURNER', E'SALESMAN', 7698, E'1981-09-08 00:00:00', 1500, 6, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7876, E'ADAMS', E'CLERK', 7788, E'1987-05-23 00:00:00', 1100, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7900, E'JAMES', E'CLERK', 7698, E'1981-12-03 00:00:00', 950, NULL, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7902, E'FORD', E'ANALYST', 7566, E'1981-12-03 00:00:00', 3000, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7934, E'MILLER', E'CLERK', 7782, E'1982-01-23 00:00:00', 1300, NULL, 10);
一、視窗函式的語法
視窗函式的基本結構:
function_name ([expression [, expression ... ]]) OVER window_name
function_name ([expression [, expression ... ]]) OVER ( window_definition )
function_name ( * ) OVER window_name
function_name ( * ) OVER ( window_definition )
window_definition 的定義:
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]
配置項frame_clause的選擇:
[ RANGE | ROWS ] frame_start
[ RANGE | ROWS ] BETWEEN frame_start AND frame_end
視窗起始項 frame_start 和結束項 frame_end的選擇:
UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING
這裡,
expression代表著任何的值表示式,但是不包含自身的視窗函式呼叫。
Window_name代表一個視窗名稱,一個完整的視窗是用WINDOW關鍵字命名並且定義的,也可以使匿名的。
PARTITION BY 與group by是類似的都是用來分組,但是與group by不同的是PARTITION BY中就只是一個表示式,不像groupby其除了分組欄位以外不能再select中顯示,除非是聚合函式。如果沒有PARTITION BY,那麼每一行就是一個分組一個視窗。
ORDER BY作用是使PARTITION BY分組中的排序方式,支援與通常的排序是一致的可以有ASC,DESC,nulls first or last等等。如果沒有指定order by那麼就是沒有指定排序方式。
frame_clause就是指視窗的大小,視窗如何移動等。
frame_start和frame_end就是來確定視窗大小的兩個引數或者是視窗的邊界。
frame_start不能使用unbounded following,同樣frame_end也不能使用unbounded preceding
UNBOUNDED PRECEDING 的意思是從第一行開始,但是隻能用在frame_start的位置。
UNBOUNDED FOLLOWING 的意思更好與UNBOUNDED PRECEDING相反指的是到最後一行,其也只能用在frame_end的位置。
value PRECEDING和 value FOLLOWING都只能使用咋rows模式中不能使用在range模式中,value的值是一個整型的數值也可以使整型表示式,不能為變數,聚合函式,視窗函式等,value不能為空或者是不明確的,但是可以為0,為0的時候表示的就是當前行。
value PRECEDING是指從哪一行開始,value FOLLOWING指的是從哪一行結束。
Value = 1 時 value PRECEDING 指的是當前行的前一行開始,value FOLLOWING則為當前行的前一行為止。隨著表中資料的掃描視窗會以這個尺寸一直走下去,執行相關的分析函式。
二、視窗函式示例:
1.從員工表(emp)中查詢每個員工的資訊,並且查詢整個公司的工資總額。
select ename,sal,
sum(sal)over(order by empno range between unbounded preceding and unbounded following)
from emp;
2.從員工明細表中查詢一個員工在前後三天所得的工資總和。
select empno,
ename,
sal,
dept_no,
sum(sal) over(
order by empno, time_stamp rows between 1 preceding and 1 following)
from emp_detail;
3.從員工表(emp)中查詢每個員工的資訊,並且查詢每個部門的工資總和。
select ename,
sal,
sum(sal) over(partition by dept_no
order by empno range between unbounded preceding and unbounded following)
from emp;
三、分析函式的介紹
row_number():在一個結果集中,返回當前的行的號碼。
rank()、dense_rank():在一個結果集中,用來排名,前者是完全差異後者是不完全差異,簡言之前者是按阿拉伯數字順序來,後者則會跳躍。
lag(value any)、lead(value any):用來對當前行對於指定的欄位與下一行或者前一行的值進行比較。
first_value(value any)、last_value(value any):在一個視窗中,返回指定排序的第一個值和最後一個值。
其他類似與sum(),agv(),max(),min()也都是能與視窗函式配合使用,當做分析函式。
四、分析函式與視窗函式的混用示例
1.從員工表(emp)中按照員工被僱傭的時間大小,查詢入職時間的先後順序。
Select row_number() over(
order by hiredate asc),
ename,
empno,
hiredate
from emp;
2.從員工表中查詢每個部門的工資排名,並且給工資最高的人加10%的獎金。
update emp
set comm = comm + sal * 0.01
where empno in (
select *
from (
select ename,
empno,
sal,
dept_no,
dense_rank() over(partition by dept_no
order by sal desc) as level_
from emp
) t
where level_ = 1
);
3.從員工表中查詢每個部門的工資排名,並且給每個部門中工資排名在第三名的員工加20%的獎金。
update emp
set comm = comm + sal * 0.02
where empno in (
select *
from (
select ename,
empno,
sal,
dept_no,
rank() over(partition by dept_no
order by sal desc) as level_
from emp
) t
where level_ = 1
);
4.從員工表中查詢每個部門的員工的工資從大到小排序,並且計算前後兩名的工資差值。
select ename,
empno,
sal,
dept_no,
lag(sal)over(partition by dept_no order by sal) as lag_end,
sal - lag(sal)over(partition by dept_no order by sal)
from emp order by dept_no,sal asc;
查詢結果中存在null中,原因是每個分組的第一行沒有前一行一次為空值。
那麼假定第一行的值需要與最後一行來比較,那麼應該怎麼做:
select ename,
empno,
sal,
dept_no,
CASE when lag(sal)over(partition by dept_no order by sal) is null then max(sal)OVER(partition by dept_no order by sal desc)
else lag(sal)over(partition by dept_no order by sal)
end lag_end
from emp order by dept_no,sal asc;
也可以是這樣:
select ename,
empno,
sal,
dept_no,
CASE when lag(sal)over(partition by dept_no order by sal) is null then first_value(sal)OVER(partition by dept_no order by sal desc)
else lag(sal)over(partition by dept_no order by sal)
end lag_end
from emp order by dept_no,sal asc;
5.查詢學生表中每一個學生按照科目的分數排序。
select id,stu_name,course,point_,first_value(point_)over(partition by id order by point_ desc ) from (
with temp as (
select id,stu_name,chinese,english,math from student
),t1 as (select id,stu_name,chinese as point_, '語文'::text as course from temp
),t2 as (select id,stu_name,english as point_, '英語'::text as course from temp
),t3 as (select id,stu_name,math as point_, '數學'::text as course from temp)
select * from t1
union all (select * from t2)
union all (select * from t3)
文件中涉及的表的結構和資料:
1.表emp_detail:
create table emp_detail(
empno integer,
ename varchar(10),
sal numeric,
dept_no integer,
time_stamp date
);
insert into emp_detail values(7369,'SMITH',100);
insert into emp_detail values(7369,'SMITH',100,20,'2015-04-01');
insert into emp_detail values(7369,'SMITH',105,20,'2015-04-02');
insert into emp_detail values(7369,'SMITH',120,20,'2015-04-03');
insert into emp_detail values(7369,'SMITH',150,20,'2015-04-04');
insert into emp_detail values(7369,'SMITH',200,20,'2015-04-05');
insert into emp_detail values(7369,'SMITH',400,20,'2015-04-06');
insert into emp_detail values(7369,'SMITH',180,20,'2015-04-07');
2.表Student:
create table student(
id int,
stu_name varchar(50),
chinese numeric,
english numeric,
math numeric
);
insert into student values(1001,'小明',80,75,90);
insert into student values(1002,'小紅',70,75,85);
insert into student values(1003,'小強',80,90,100);
3.表emp:
CREATE TABLE public.emp (
empno INTEGER,
ename VARCHAR(10),
job VARCHAR(9),
mgr INTEGER,
hiredate TIMESTAMP(6) WITHOUT TIME ZONE,
sal DOUBLE PRECISION,
comm DOUBLE PRECISION,
dept_no INTEGER
);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7369, E'SMITH', E'CLERK', 7902, E'1980-12-17 00:00:00', 800, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7499, E'ALLEN', E'SALESMAN', 7698, E'1981-02-20 00:00:00', 1600, 306, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7521, E'WARD', E'SALESMAN', 7698, E'1981-02-22 00:00:00', 1250, 506, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7566, E'JONES', E'MANAGER', 7839, E'1981-04-02 00:00:00', 2975, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7654, E'MARTIN', E'SALESMAN', 7698, E'1981-09-28 00:00:00', 1250, 1406, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7698, E'BLAKE', E'MANAGER', 7839, E'1981-05-01 00:00:00', 2850, NULL, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7782, E'CLARK', E'MANAGER', 7839, E'1981-06-09 00:00:00', 2450, NULL, 10);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7788, E'SCOTT', E'ANALYST', 7566, E'1987-04-19 00:00:00', 3000, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7839, E'KING', E'PRESIDENT', NULL, E'1981-11-17 00:00:00', 5000, NULL, 10);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7844, E'TURNER', E'SALESMAN', 7698, E'1981-09-08 00:00:00', 1500, 6, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7876, E'ADAMS', E'CLERK', 7788, E'1987-05-23 00:00:00', 1100, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7900, E'JAMES', E'CLERK', 7698, E'1981-12-03 00:00:00', 950, NULL, 30);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7902, E'FORD', E'ANALYST', 7566, E'1981-12-03 00:00:00', 3000, NULL, 20);
INSERT INTO public.emp ("empno", "ename", "job", "mgr", "hiredate", "sal", "comm", "dept_no")
VALUES (7934, E'MILLER', E'CLERK', 7782, E'1982-01-23 00:00:00', 1300, NULL, 10);
一、視窗函式的語法
視窗函式的基本結構:
function_name ([expression [, expression ... ]]) OVER window_name
function_name ([expression [, expression ... ]]) OVER ( window_definition )
function_name ( * ) OVER window_name
function_name ( * ) OVER ( window_definition )
window_definition 的定義:
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]
配置項frame_clause的選擇:
[ RANGE | ROWS ] frame_start
[ RANGE | ROWS ] BETWEEN frame_start AND frame_end
視窗起始項 frame_start 和結束項 frame_end的選擇:
UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING
這裡,
expression代表著任何的值表示式,但是不包含自身的視窗函式呼叫。
Window_name代表一個視窗名稱,一個完整的視窗是用WINDOW關鍵字命名並且定義的,也可以使匿名的。
PARTITION BY 與group by是類似的都是用來分組,但是與group by不同的是PARTITION BY中就只是一個表示式,不像groupby其除了分組欄位以外不能再select中顯示,除非是聚合函式。如果沒有PARTITION BY,那麼每一行就是一個分組一個視窗。
ORDER BY作用是使PARTITION BY分組中的排序方式,支援與通常的排序是一致的可以有ASC,DESC,nulls first or last等等。如果沒有指定order by那麼就是沒有指定排序方式。
frame_clause就是指視窗的大小,視窗如何移動等。
frame_start和frame_end就是來確定視窗大小的兩個引數或者是視窗的邊界。
frame_start不能使用unbounded following,同樣frame_end也不能使用unbounded preceding
UNBOUNDED PRECEDING 的意思是從第一行開始,但是隻能用在frame_start的位置。
UNBOUNDED FOLLOWING 的意思更好與UNBOUNDED PRECEDING相反指的是到最後一行,其也只能用在frame_end的位置。
value PRECEDING和 value FOLLOWING都只能使用咋rows模式中不能使用在range模式中,value的值是一個整型的數值也可以使整型表示式,不能為變數,聚合函式,視窗函式等,value不能為空或者是不明確的,但是可以為0,為0的時候表示的就是當前行。
value PRECEDING是指從哪一行開始,value FOLLOWING指的是從哪一行結束。
Value = 1 時 value PRECEDING 指的是當前行的前一行開始,value FOLLOWING則為當前行的前一行為止。隨著表中資料的掃描視窗會以這個尺寸一直走下去,執行相關的分析函式。
二、視窗函式示例:
1.從員工表(emp)中查詢每個員工的資訊,並且查詢整個公司的工資總額。
select ename,sal,
sum(sal)over(order by empno range between unbounded preceding and unbounded following)
from emp;
2.從員工明細表中查詢一個員工在前後三天所得的工資總和。
select empno,
ename,
sal,
dept_no,
sum(sal) over(
order by empno, time_stamp rows between 1 preceding and 1 following)
from emp_detail;
3.從員工表(emp)中查詢每個員工的資訊,並且查詢每個部門的工資總和。
select ename,
sal,
sum(sal) over(partition by dept_no
order by empno range between unbounded preceding and unbounded following)
from emp;
三、分析函式的介紹
row_number():在一個結果集中,返回當前的行的號碼。
rank()、dense_rank():在一個結果集中,用來排名,前者是完全差異後者是不完全差異,簡言之前者是按阿拉伯數字順序來,後者則會跳躍。
lag(value any)、lead(value any):用來對當前行對於指定的欄位與下一行或者前一行的值進行比較。
first_value(value any)、last_value(value any):在一個視窗中,返回指定排序的第一個值和最後一個值。
其他類似與sum(),agv(),max(),min()也都是能與視窗函式配合使用,當做分析函式。
四、分析函式與視窗函式的混用示例
1.從員工表(emp)中按照員工被僱傭的時間大小,查詢入職時間的先後順序。
Select row_number() over(
order by hiredate asc),
ename,
empno,
hiredate
from emp;
2.從員工表中查詢每個部門的工資排名,並且給工資最高的人加10%的獎金。
update emp
set comm = comm + sal * 0.01
where empno in (
select *
from (
select ename,
empno,
sal,
dept_no,
dense_rank() over(partition by dept_no
order by sal desc) as level_
from emp
) t
where level_ = 1
);
3.從員工表中查詢每個部門的工資排名,並且給每個部門中工資排名在第三名的員工加20%的獎金。
update emp
set comm = comm + sal * 0.02
where empno in (
select *
from (
select ename,
empno,
sal,
dept_no,
rank() over(partition by dept_no
order by sal desc) as level_
from emp
) t
where level_ = 1
);
4.從員工表中查詢每個部門的員工的工資從大到小排序,並且計算前後兩名的工資差值。
select ename,
empno,
sal,
dept_no,
lag(sal)over(partition by dept_no order by sal) as lag_end,
sal - lag(sal)over(partition by dept_no order by sal)
from emp order by dept_no,sal asc;
查詢結果中存在null中,原因是每個分組的第一行沒有前一行一次為空值。
那麼假定第一行的值需要與最後一行來比較,那麼應該怎麼做:
select ename,
empno,
sal,
dept_no,
CASE when lag(sal)over(partition by dept_no order by sal) is null then max(sal)OVER(partition by dept_no order by sal desc)
else lag(sal)over(partition by dept_no order by sal)
end lag_end
from emp order by dept_no,sal asc;
也可以是這樣:
select ename,
empno,
sal,
dept_no,
CASE when lag(sal)over(partition by dept_no order by sal) is null then first_value(sal)OVER(partition by dept_no order by sal desc)
else lag(sal)over(partition by dept_no order by sal)
end lag_end
from emp order by dept_no,sal asc;
5.查詢學生表中每一個學生按照科目的分數排序。
select id,stu_name,course,point_,first_value(point_)over(partition by id order by point_ desc ) from (
with temp as (
select id,stu_name,chinese,english,math from student
),t1 as (select id,stu_name,chinese as point_, '語文'::text as course from temp
),t2 as (select id,stu_name,english as point_, '英語'::text as course from temp
),t3 as (select id,stu_name,math as point_, '數學'::text as course from temp)
select * from t1
union all (select * from t2)
union all (select * from t3)