1. 程式人生 > >如何協助 MySQL 實現 Oracle 高階分析函式

如何協助 MySQL 實現 Oracle 高階分析函式

Oracle 支援一些獨特的語法和函式,在移植到 MySQL 上時或多或少給程式設計師造成了困擾,下面我們針對 Oracle 的一些特殊用法舉例並講解如何用集算器來完成同樣功能。這些方法當然也不限於針對 MySQL,對於所有其它資料庫也能支援。

 

1、         遞迴語句

a)     select employee_id,first_name,last_name,manager_id

from hr.employees  

start with employee_id=102  

connect by prior employee_id = manager_id


A

1

=connect("orcl")

2

[email protected]("select   employee_id, first_name, last_name, manager_id from hr.employees")

3

=A2.keys(EMPLOYEE_ID)

4

[email protected]

(EMPLOYEE_ID==102)

5

=A2.switch(MANAGER_ID,   A2)

6

=A2.nodes(MANAGER_ID,   A4)

7

=(A4|A6).new(EMPLOYEE_ID,   FIRST_NAME, LAST_NAME, MANAGER_ID.EMPLOYEE_ID:MANAGER_ID)

(1)   A3 設定序表 A2 的鍵

(2)   A4 選取起始僱員

(3)   A5 將 A2 中 MANAGER_ID 值轉換成記錄,以便遞迴

(4)   A6 獲取起始僱員的所有子節點

                                              undefined

b)    select employee_id, first_name,last_name,manager_id  

from hr.employees  

start with employee_id=104  

connect by prior manager_id = employee_id


A

1

=connect("orcl")

2

[email protected]("select   employee_id, first_name, last_name, manager_id from hr.employees")

3

=A2.keys(EMPLOYEE_ID)

4

=A2.switch(MANAGER_ID,   A2)

5

[email protected](EMPLOYEE_ID==104)

6

=A5.prior(MANAGER_ID)

7

=A6.new(EMPLOYEE_ID,   FIRST_NAME, LAST_NAME, MANAGER_ID.EMPLOYEE_ID:MANAGER_ID)

(1)   A6 獲取起始僱員的所有父節點

undefined

c)     select employee_id,last_name,manager_id,sys_connect_by_path(last_name,'/') path from hr.employees  

start with employee_id=102

connect by prior employee_id = manager_id


A

1

=connect("orcl")

2

[email protected]("select   employee_id, last_name, manager_id,null path from hr.employees")

3

=A2.keys(EMPLOYEE_ID)

4

[email protected](EMPLOYEE_ID==102)

5

=A2.switch(MANAGER_ID,   A2)

6

=A2.nodes(MANAGER_ID,   A4)

7

=A4|A6

8

=A7.run(PATH=if(EMPLOYEE_ID==102,   "/"+LAST_NAME, MANAGER_ID.PATH+"/"+LAST_NAME))

9

=A7.new(EMPLOYEE_ID,   LAST_NAME, MANAGER_ID.EMPLOYEE_ID:MANAGER_ID, PATH)

(1)   由於 A7 中每條記錄的父節點都在本節點之前,故 A8 可以從前往後對 A7 中每條記錄依次修改 PATH 值

undefined

 

2、         巢狀聚集函式

select avg(max(salary)) avg_max, avg(min(salary)) avg_min

from hr.employees

group by department_id


A

1

=connect("orcl")

2

[email protected]("select * from   hr.employees")

3

=A2.groups(DEPARTMENT_ID;max(SALARY):m1,   min(SALARY):m2)

4

=A3.group(;~.avg(m1):avg_max,~.avg(m2):avg_min)

(1)   A2 中 A1.query 也可以改用 A1.cursor

undefined

 

3、         聚集分析函式 FIRST 和 LAST

SELECT department_id,

MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct) worst,

MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct) best

FROM hr.employees

GROUP BY department_id

ORDER BY department_id


A

1

=connect("orcl")

2

[email protected]("select   * from hr.employees order by department_id,commission_pct")

3

[email protected](DEPARTMENT_ID)

4

=A3.new(DEPARTMENT_ID,[email protected](ifn(COMMISSION_PCT,2)).min(SALARY):worst,   [email protected](ifn(COMMISSION_PCT,2)).max(SALARY):best)

5

=A4.sort(ifn(DEPARTMENT_ID,power(2,32)))

(1)   A2 已按 DEPARTMENT_ID 排序,則 A3 分組時可採用 [email protected]

(2)   FIRST/LAST 取排序的後第一組 / 最後一組,而 Oracle 排序時 null 排在最後,所以 LAST 會取到的最後一組就是 null 值所在組。maxp/minp 求具有最大值 / 最小值的所有行時排除了 null,所以在 A4 是用 ifn(COMMISSION_PCT,2) 保證 null 值時最大

(3)   A5 中,DEPARTMENT_ID=null 時採用採用比所有 DEPARTMENT_ID 都大的 power(2,32) 來保證這一行排在最後

如果資料量大,還可以採用遊標方式。


A

1

=connect("orcl")

2

[email protected]("select   * from hr.employees")

3

=A2.groups(DEPARTMENT_ID;   min([if(COMMISSION_PCT,2),SALARY]):m1, max([if(COMMISSION_PCT,2),   SALARY]):m2)

4

=A3.new(DEPARTMENT_ID,   m1(2):worst, m2(2):best)

5

=A4.sort(ifn(DEPARTMENT_ID,power(2,32)))

(1)    A3 中,min([if(COMMISSION_PCT,2), SALARY]) 求出 COMMISSION_PCT 最小時的 SALARY 最小值,即 COMMISSION_PCT 排名第一時 SALARY 最小值,max 類似

undefined

 

4、         佔比函式 ratio_to_report

a)      SELECT last_name, salary, RATIO_TO_REPORT(salary) OVER () AS rr

FROM hr.employees

WHERE job_id = 'PU_CLERK'

ORDER BY last_name


A

1

=connect("orcl")

2

[email protected]("select   last_name,salary from hr.employees where job_id='PU_CLERK'order by   last_name")

3

=A2.sum(SALARY)

4

=A2.new(LAST_NAME, SALARY, SALARY/A3:RR)

       undefined

b)      SELECT department_id,last_name, salary, RATIO_TO_REPORT(salary) OVER (partition by department_id) AS rr

FROM hr.employees

WHERE department_id in (20,60)

ORDER BY department_id,last_name


A

1

=connect("orcl")

2

[email protected]("select   department_id,last_name,salary from hr.employees where department_id in   (20,60) order by department_id,last_name")

3

[email protected](DEPARTMENT_ID;sum(SALARY):sum)

4

=A2.switch(DEPARTMENT_ID,   A3)

5

=A2.new(DEPARTMENT_ID.DEPARTMENT_ID:DEPARTMENT_ID,   LAST_NAME, SALARY, SALARY/DEPARTMENT_ID.sum:RR)

(1)    A2 中已按 DEPARTMENT_ID 排序,則 A3 可用 [email protected] 分組聚集

undefined

 

5、         多重分組

SELECT department_id, job_id, sum(salary) total

FROM hr.employees

WHERE department_id in (30, 50)

GROUP BY grouping sets((department_id, job_id), department_id)


A

1

=connect("orcl")

2

[email protected]("select department_id,   job_id, salary from hr.employees where department_Id in (30,50) order by   department_id, job_id")

3

[email protected](DEPARTMENT_ID, JOB_ID;   sum(SALARY):TOTAL)

4

[email protected](DEPARTMENT_ID, null:JOB_ID; ~.sum(TOTAL):TOTAL)

5

=[A3,A4].merge(DEPARTMENT_ID,   ifn(JOB_ID,fill("z", 10)))

(1)    因為 A3 和 A4 均對 DEPARTMENT_ID 有序,故 A5 可 merge,ifn(JOB_ID,fill("z",10))) 用來保證 JOB_ID 為 null 排在後面

也可以採用遊標方式。


A

1

=connect("orcl")

2

[email protected]("select   department_id,job_id,sum(salary) total from hr.employees where department_id   in (30,50) group by department_id, job_id order by   department_id,job_id")

3

=A2.group(DEPARTMENT_ID)

4

=A3.(~.insert(0,   [email protected](DEPARTMENT_ID, null:JOB_ID;sum(TOTAL):TOTAL)))

5

=A4.fetch()

6

=A5.conj()

(1)    A3 中 A2.group 要求 A2 對 DEPARTMENT_ID 有序

(2)    A4 對 A3 每一組求和並將結果插入此組末尾

還可以採用管道方式。


A

1

=connect("orcl")

2

[email protected]("select department_id,   job_id, salary from hr.employees where department_Id in (30,50) order by   department_id, job_id")

3

=channel()[email protected](DEPARTMENT_ID, JOB_ID;   sum(SALARY):TOTAL)

4

>A2.push(A3)

5

=channel()[email protected](DEPARTMENT_ID,   null:JOB_ID; sum(TOTAL):TOTAL)

6

>A3.push(A5)

7

=A3.fetch()

8

for A2,1000

9

=A3.result()|A5.result()

10

=A9.sort(DEPARTMENT_ID)

(1)   A3 建立管道,並附加分組求和

(2)   A4 將 A2 中資料推送到 A3,注意此動作只有在 A2 中資料有實際取出行為才執行

(3)   A5 建立管道,並附加分組求和

(4)   A6 將 A3 結果推送到 A5,此處也可以直接將 A2 中資料推送到 A5,但會增加時間複雜度

(5)   A7 保留 A3 的資料

(6)   迴圈讀取 A2,每次只取 1000 條,減少記憶體佔用

(7)   A10 對 A3 和 A5 中資料排序,因為演算法是穩定的,所以 JOB_ID 為 null 的排在後面

undefined