MySQL 8.0新特性之Windows Function
MySQL8.0開始像Oracle一樣支援視窗函數了,先來跑個SQL,感受一下視窗函式的魅力。
mysql> create table numbers(val int); mysql> insert into numbers values(1),(1),(2),(3),(3),(3),(4),(4),(5); mysql> SELECT -> val, -> ROW_NUMBER() OVER w AS 'row_number', -> RANK() OVER w AS 'rank', -> DENSE_RANK() OVER w AS 'dense_rank', -> CUME_DIST() OVER w AS 'cume_dist', -> PERCENT_RANK() OVER w AS 'percent_rank' -> FROM numbers -> WINDOW w AS (ORDER BY val); +------+------------+------+------------+--------------------+--------------+ | val | row_number | rank | dense_rank | cume_dist | percent_rank | +------+------------+------+------------+--------------------+--------------+ | 1 | 1 | 1 | 1 | 0.2222222222222222 | 0 | | 1 | 2 | 1 | 1 | 0.2222222222222222 | 0 | | 2 | 3 | 3 | 2 | 0.3333333333333333 | 0.25 | | 3 | 4 | 4 | 3 | 0.6666666666666666 | 0.375 | | 3 | 5 | 4 | 3 | 0.6666666666666666 | 0.375 | | 3 | 6 | 4 | 3 | 0.6666666666666666 | 0.375 | | 4 | 7 | 7 | 4 | 0.8888888888888888 | 0.75 | | 4 | 8 | 7 | 4 | 0.8888888888888888 | 0.75 | | 5 | 9 | 9 | 5 | 1 | 1 | +------+------------+------+------------+--------------------+--------------+ 9 rows in set (0.00 sec)
有了視窗函式,以前一些複製的SQL,可以很簡單的實現了,下面分別介紹幾種常用的視窗函式。
row_number()
用來求滿足特定分組的唯一行資料,它跟group by不同,group by不能返回重複的分組。
例:求每個部門dept_no員工號emp_no最大的員工資料
(1)方法1:使用視窗函式
第一步: mysql> select t.*,row_number() over(partition by t.dept_no order by t.emp_no desc) rn from t_group t; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 2 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 3 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 4 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 2 | +--------+---------+------------+------------+----+ 10 rows in set (0.01 sec) 第二步: mysql> select a.* from (select t.*,row_number() over(partition by t.dept_no order by t.emp_no desc) rn from t_group t)a where a.rn=1; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | +--------+---------+------------+------------+----+ 6 rows in set (0.00 sec)
(2)方法2:使用傳統變數@方法
第一步: mysql> select t.*, if(@dept_no=t.dept_no, @rn:[email protected]+1, @rn:=1)as rn, @dept_no:=t.dept_no as calc_dept_no from (select * from t_group t order by t.dept_no,t.emp_no desc)t, (select @rn:=0 rn, @dept_no:='')b; +--------+---------+------------+------------+------+--------------+ | emp_no | dept_no | from_date | to_date | rn | calc_dept_no | +--------+---------+------------+------------+------+--------------+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | d002 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | d004 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | d005 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 2 | d005 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 3 | d005 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 4 | d005 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | d006 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | d007 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | d008 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 2 | d008 | +--------+---------+------------+------------+------+--------------+ 10 rows in set, 5 warnings (0.00 sec) 第二步: mysql> select * from ( -> select t.*, if(@dept_no=t.dept_no, @rn:[email protected]+1, @rn:=1)as rn, @dept_no:=t.dept_no as calc_dept_no from (select * from t_group t order by t.dept_no,t.emp_no desc)t, (select @rn:=0 rn, @dept_no:='')b)c where c.rn=1; +--------+---------+------------+------------+------+--------------+ | emp_no | dept_no | from_date | to_date | rn | calc_dept_no | +--------+---------+------------+------------+------+--------------+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | d002 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | d004 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | d005 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | d006 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | d007 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | d008 | +--------+---------+------------+------------+------+--------------+ 6 rows in set, 5 warnings (0.00 sec)
rank()
同一個分組中,如果排序的列值相同,編號相同;
列值不同,編號跳躍增加,如兩個並列第1,那麼下一個名次的編號就是3;
如下按照dept_no分組,dept_no='d005'列對應to_date='9999-01-01'有三個1,那麼to_date='2017-03-29'跳躍增加的編號就是4;
mysql> select t.*, rank() over(partition by t.dept_no order by t.to_date desc) rn from t_group t; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 4 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 2 | +--------+---------+------------+------------+----+ 10 rows in set (0.00 sec)
dense_rank()
同一個分組中,如果排序的列值相同,編號相同;
列值不同,編號順序增加,如兩個並列第1,那麼下一個名次的編號就是2;
mysql> select t.*, dense_rank() over(partition by t.dept_no order by t.to_date desc) rn from t_group t; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 2 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 2 | +--------+---------+------------+------------+----+ 10 rows in set (0.00 sec)
lag(expr,offset,default) over() 返回當前行列對應的上一行資料
lead(expr,offset,default) over() 返回當前行列對應的下一行資料
mysql> select s.*, lag(s.salary,1,null) over(partition by s.emp_no order by s.to_date asc) rn from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+ | emp_no | salary | from_date | to_date | rn | +--------+--------+------------+------------+-------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | NULL | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 60117 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 62102 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 66074 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 66596 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 66961 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 71046 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 74333 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 75286 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 75994 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 76884 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 80013 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 81025 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 81097 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 84917 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 85112 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 85097 | +--------+--------+------------+------------+-------+ 17 rows in set (0.00 sec) mysql> select s.*, lead(s.salary,1,0) over(partition by s.emp_no order by s.to_date asc) rn from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+ | emp_no | salary | from_date | to_date | rn | +--------+--------+------------+------------+-------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 62102 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 66074 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 66596 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 66961 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 71046 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 74333 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 75286 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 75994 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 76884 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 80013 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 81025 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 81097 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 84917 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 85112 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 85097 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 88958 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 0 | +--------+--------+------------+------------+-------+ 17 rows in set (0.00 sec)
sum()、min()、max()增強
rows between unbounded preceding and unbounded following 從開始到最後一行
rows between unbounded preceding and current row 從開始到目前行
mysql> select s.*, -> sum(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and unbounded following) s1, -> sum(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and current row) s2 -> from salaries s where s.emp_no=10001; +--------+--------+------------+------------+---------+---------+ | emp_no | salary | from_date | to_date | s1 | s2 | +--------+--------+------------+------------+---------+---------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 1281612 | 60117 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 1281612 | 122219 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 1281612 | 188293 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 1281612 | 254889 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 1281612 | 321850 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 1281612 | 392896 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 1281612 | 467229 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 1281612 | 542515 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 1281612 | 618509 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 1281612 | 695393 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 1281612 | 775406 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 1281612 | 856431 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 1281612 | 937528 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 1281612 | 1022445 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 1281612 | 1107557 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 1281612 | 1192654 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 1281612 | 1281612 | +--------+--------+------------+------------+---------+---------+ 17 rows in set (0.00 sec) mysql> select s.*, -> min(s.salary) over(partition by s.emp_no order by s.to_date desc rows between unbounded preceding and unbounded following) m1, -> min(s.salary) over(partition by s.emp_no order by s.to_date desc rows between unbounded preceding and current row) m2, -> min(s.salary) over(partition by s.emp_no) m3 -> from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+-------+-------+ | emp_no | salary | from_date | to_date | m1 | m2 | m3 | +--------+--------+------------+------------+-------+-------+-------+ | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 60117 | 88958 | 60117 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 60117 | 85097 | 60117 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 60117 | 85097 | 60117 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 60117 | 84917 | 60117 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 60117 | 81097 | 60117 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 60117 | 81025 | 60117 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 60117 | 80013 | 60117 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 60117 | 76884 | 60117 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 60117 | 75994 | 60117 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 60117 | 75286 | 60117 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 60117 | 74333 | 60117 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 60117 | 71046 | 60117 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 60117 | 66961 | 60117 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 60117 | 66596 | 60117 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 60117 | 66074 | 60117 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 60117 | 62102 | 60117 | | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 60117 | 60117 | 60117 | +--------+--------+------------+------------+-------+-------+-------+ 17 rows in set (0.00 sec) mysql> select s.*, -> max(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and unbounded following) m1, -> max(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and current row) m2, -> max(s.salary) over(partition by s.emp_no) m3 -> from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+-------+-------+ | emp_no | salary | from_date | to_date | m1 | m2 | m3 | +--------+--------+------------+------------+-------+-------+-------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 88958 | 60117 | 88958 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 88958 | 62102 | 88958 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 88958 | 66074 | 88958 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 88958 | 66596 | 88958 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 88958 | 66961 | 88958 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 88958 | 71046 | 88958 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 88958 | 74333 | 88958 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 88958 | 75286 | 88958 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 88958 | 75994 | 88958 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 88958 | 76884 | 88958 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 88958 | 80013 | 88958 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 88958 | 81025 | 88958 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 88958 | 81097 | 88958 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 88958 | 84917 | 88958 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 88958 | 85112 | 88958 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 88958 | 85112 | 88958 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 88958 | 88958 | 88958 | +--------+--------+------------+------------+-------+-------+-------+ 17 rows in set (0.00 sec)
還有很多沒有提到的視窗函式,可以參考手冊。
CUME_DIST()
FIRST_VALUE()
LAST_VALUE()
NTH_VALUE()
NTILE()
參考連結
12.21.1 Window Function Descriptions