1. 程式人生 > >oracle之優化一用group by或exists優化distinct

oracle之優化一用group by或exists優化distinct

今天mentor給了一個sql語句優化的任務。(環境是sql developer)有一個語句執行很慢,查詢出來的結果有17544條記錄,但需970秒,速度很慢。語句是這樣的:

    SELECT  DISTINCT    'AMEND_NEW',
                  reporttitle,
                  reportsubtitle,
                  cab_cab_transactions.branchcode,
                  cab_cab_transactions.prtfo_cd,
                  cab_cab_transactions.
sstm_scrty_id, cab_cab_transactions.sstm_trx_id, cab_cab_transactions.trde_dttm, cab_cab_transactions.efcte_dttm, cab_cab_transactions.due_stlmnt_dt, cab_cab_transactions.cncl_efcte_dttm, cab_cab_transactions.
trde_sstm_id, cab_cab_transactions.trx_type_cd, cab_cab_transactions.trx_type_dscrn, cab_cab_transactions.trx_subtype_cd, cab_cab_transactions.trde_stat_flg, cab_cab_transactions.csh_cr_dr_indcr, cab_cab_transactions.
long_shrt_indcr, cab_cab_transactions.lcl_crncy, cab_cab_transactions.stlmt_crncy, cab_cab_transactions.nomin_qty, cab_cab_transactions.price, cab_cab_transactions.lcl_cst, cab_cab_transactions.prtfo_cst, cab_cab_transactions.lcl_book_cst, cab_cab_transactions.prtfo_book_cst, cab_cab_transactions.lcl_sell_prcds, cab_cab_transactions.prtfo_sell_prcds, cab_cab_transactions.lcl_gnls, cab_cab_transactions.prtfo_gnls, cab_cab_transactions.lcl_acrd_intrt, cab_cab_transactions.prtfo_acrd_intrt, cab_cab_transactions.stlmt_crncy_stlmt_amt, cab_cab_transactions.lcl_net_amt, cab_cab_transactions.prtfo_net_amt, cab_cab_transactions.fx_bght_amt, cab_cab_transactions.fx_sold_amt, cab_cab_transactions.prtfo_crncy_stlmt_amt, cab_cab_transactions.prtfo_net_incme, cab_cab_transactions.dvnd_crncy_net_incme, cab_cab_transactions.dvnd_type_cd, cab_cab_transactions.lcl_intrt_pd_rec, cab_cab_transactions.prtfo_intrt_pd_rec, cab_cab_transactions.lcl_dvdnd_pd_rec, cab_cab_transactions.prtfo_dvdnd_pd_rec, cab_cab_transactions.lcl_sundry_inc_pd_rec, cab_cab_transactions.prtfo_sundry_inc_pd_rec, cab_cab_transactions.bnk_csh_cptl_secid, cab_cab_transactions.bnk_csh_inc_secid, cab_cab_transactions.reportdate, cab_cab_transactions.filename, sysdate, 'e483448' FROM cab_cfg_trx_type_mapping RIGHT JOIN(cab_cab_tran_adjustments INNER JOIN cab_cab_transactions ON(cab_cab_transactions.branchcode = cab_cab_tran_adjustments.branchcode ) AND(cab_cab_tran_adjustments.sstm_trx_id = cab_cab_transactions.sstm_trx_id)) ON(cab_cfg_trx_type_mapping.cab_trx_type_cd = cab_cab_transactions.trx_type_cd) AND(nvl(cab_cfg_trx_type_mapping.cab_trx_subtype_cd,' ') = nvl(cab_cab_transactions.trx_subtype_cd,' ') AND (cab_cfg_trx_type_mapping.branchcode=cab_cab_transactions.branchcode)) WHERE cab_cab_transactions.prtfo_cd IN (SELECT DISTINCT prtfo_cd FROM cab_cab_valuations_working WHERE created_by = 'e483448' AND branchcode='ISA') AND cab_cab_tran_adjustments.efcte_dttm > '2011-07-31' AND cab_cab_tran_adjustments.efcte_dttm <= '2011-08-31' AND eff_trde_stat_flg <> 'X' AND cab_cab_transactions.branchcode = 'ISA' AND cab_cab_tran_adjustments.branchcode = 'ISA' AND(cab_cfg_trx_type_mapping.cab_reportgroup = 'CABValuation' OR cab_cfg_trx_type_mapping.cab_reportgroup IS NULL)

問題在distinct上面,它會導致對全表掃描,而且會導致排序,然後刪除重複的記錄,所以速度很慢,因此需要優化distinct。查了不少資料,並逐一嘗試,最後發現了一個非常可觀的優化結果,用group by。語句如下:

    SELECT      'AMEND_NEW',
                  reporttitle,
                  reportsubtitle,
                  cab_cab_transactions.branchcode,
                  cab_cab_transactions.prtfo_cd,
                  cab_cab_transactions.sstm_scrty_id,
                  cab_cab_transactions.sstm_trx_id,
                  cab_cab_transactions.trde_dttm,
                  cab_cab_transactions.efcte_dttm,
                  cab_cab_transactions.due_stlmnt_dt,
                  cab_cab_transactions.cncl_efcte_dttm,
                  cab_cab_transactions.trde_sstm_id,
                  cab_cab_transactions.trx_type_cd,
                  cab_cab_transactions.trx_type_dscrn,
                  cab_cab_transactions.trx_subtype_cd,
                  cab_cab_transactions.trde_stat_flg,
                  cab_cab_transactions.csh_cr_dr_indcr,
                  cab_cab_transactions.long_shrt_indcr,
                  cab_cab_transactions.lcl_crncy,
                  cab_cab_transactions.stlmt_crncy,
                  cab_cab_transactions.nomin_qty,
                  cab_cab_transactions.price,
                  cab_cab_transactions.lcl_cst,
                  cab_cab_transactions.prtfo_cst,
                  cab_cab_transactions.lcl_book_cst,
                  cab_cab_transactions.prtfo_book_cst,
                  cab_cab_transactions.lcl_sell_prcds,
                  cab_cab_transactions.prtfo_sell_prcds,
                  cab_cab_transactions.lcl_gnls,
                  cab_cab_transactions.prtfo_gnls,
                  cab_cab_transactions.lcl_acrd_intrt,
                  cab_cab_transactions.prtfo_acrd_intrt,
                  cab_cab_transactions.stlmt_crncy_stlmt_amt,
                  cab_cab_transactions.lcl_net_amt,
                  cab_cab_transactions.prtfo_net_amt,
                  cab_cab_transactions.fx_bght_amt,
                  cab_cab_transactions.fx_sold_amt,
                  cab_cab_transactions.prtfo_crncy_stlmt_amt,
                  cab_cab_transactions.prtfo_net_incme,
                  cab_cab_transactions.dvnd_crncy_net_incme,
                  cab_cab_transactions.dvnd_type_cd,
                  cab_cab_transactions.lcl_intrt_pd_rec,
                  cab_cab_transactions.prtfo_intrt_pd_rec,
                  cab_cab_transactions.lcl_dvdnd_pd_rec,
                  cab_cab_transactions.prtfo_dvdnd_pd_rec,
                  cab_cab_transactions.lcl_sundry_inc_pd_rec,
                  cab_cab_transactions.prtfo_sundry_inc_pd_rec,
                  cab_cab_transactions.bnk_csh_cptl_secid,
                  cab_cab_transactions.bnk_csh_inc_secid,
                  cab_cab_transactions.reportdate,
                  cab_cab_transactions.filename,
                   sysdate,
                  'e483448'
          FROM cab_cfg_trx_type_mapping RIGHT JOIN(cab_cab_tran_adjustments
                INNER JOIN cab_cab_transactions ON(cab_cab_transactions.branchcode = cab_cab_tran_adjustments.branchcode )
                 AND(cab_cab_tran_adjustments.sstm_trx_id = cab_cab_transactions.sstm_trx_id)) ON(cab_cfg_trx_type_mapping.cab_trx_type_cd = cab_cab_transactions.trx_type_cd)
                 AND(nvl(cab_cfg_trx_type_mapping.cab_trx_subtype_cd,' ') = nvl(cab_cab_transactions.trx_subtype_cd,' ')
                 AND (cab_cfg_trx_type_mapping.branchcode=cab_cab_transactions.branchcode))
                WHERE cab_cab_transactions.prtfo_cd IN
                  (SELECT DISTINCT prtfo_cd
                   FROM cab_cab_valuations_working
                   WHERE created_by = 'e483448'
                        AND branchcode='ISA')
                 AND cab_cab_tran_adjustments.efcte_dttm > '2011-07-31'
                 AND cab_cab_tran_adjustments.efcte_dttm <= '2011-08-31'
                 AND eff_trde_stat_flg <> 'X'
                 AND cab_cab_transactions.branchcode = 'ISA'
                 AND cab_cab_tran_adjustments.branchcode = 'ISA'
                 AND(cab_cfg_trx_type_mapping.cab_reportgroup = 'CABValuation' OR cab_cfg_trx_type_mapping.cab_reportgroup IS NULL)
    			 GROUP BY   reporttitle,
                  reportsubtitle,
                  cab_cab_transactions.branchcode,
                  cab_cab_transactions.prtfo_cd,
                  cab_cab_transactions.sstm_scrty_id,
                  cab_cab_transactions.sstm_trx_id,
                  cab_cab_transactions.trde_dttm,
                  cab_cab_transactions.efcte_dttm,
                  cab_cab_transactions.due_stlmnt_dt,
                  cab_cab_transactions.cncl_efcte_dttm,
                  cab_cab_transactions.trde_sstm_id,
                  cab_cab_transactions.trx_type_cd,
                  cab_cab_transactions.trx_type_dscrn,
                  cab_cab_transactions.trx_subtype_cd,
                  cab_cab_transactions.trde_stat_flg,
                  cab_cab_transactions.csh_cr_dr_indcr,
                  cab_cab_transactions.long_shrt_indcr,
                  cab_cab_transactions.lcl_crncy,
                  cab_cab_transactions.stlmt_crncy,
                  cab_cab_transactions.nomin_qty,
                  cab_cab_transactions.price,
                  cab_cab_transactions.lcl_cst,
                  cab_cab_transactions.prtfo_cst,
                  cab_cab_transactions.lcl_book_cst,
                  cab_cab_transactions.prtfo_book_cst,
                  cab_cab_transactions.lcl_sell_prcds,
                  cab_cab_transactions.prtfo_sell_prcds,
                  cab_cab_transactions.lcl_gnls,
                  cab_cab_transactions.prtfo_gnls,
                  cab_cab_transactions.lcl_acrd_intrt,
                  cab_cab_transactions.prtfo_acrd_intrt,
                  cab_cab_transactions.stlmt_crncy_stlmt_amt,
                  cab_cab_transactions.lcl_net_amt,
                  cab_cab_transactions.prtfo_net_amt,
                  cab_cab_transactions.fx_bght_amt,
                  cab_cab_transactions.fx_sold_amt,
                  cab_cab_transactions.prtfo_crncy_stlmt_amt,
                  cab_cab_transactions.prtfo_net_incme,
                  cab_cab_transactions.dvnd_crncy_net_incme,
                  cab_cab_transactions.dvnd_type_cd,
                  cab_cab_transactions.lcl_intrt_pd_rec,
                  cab_cab_transactions.prtfo_intrt_pd_rec,
                  cab_cab_transactions.lcl_dvdnd_pd_rec,
                  cab_cab_transactions.prtfo_dvdnd_pd_rec,
                  cab_cab_transactions.lcl_sundry_inc_pd_rec,
                  cab_cab_transactions.prtfo_sundry_inc_pd_rec,
                  cab_cab_transactions.bnk_csh_cptl_secid,
                  cab_cab_transactions.bnk_csh_inc_secid,
                  cab_cab_transactions.reportdate,
                  cab_cab_transactions.filename

最後執行時間只有15.1秒,快了60多倍,不得不說這優化效果還是很可觀的。不過查了很多資料,仍然沒有發現合理地解釋:為什麼distinctgroup by的效率會有這麼大差別。查的很多資料,講的基本都是兩者相差不大,實現也差不多。有待解決。


關於distinctgroup by的去重邏輯淺析

在資料庫操作中,我們常常遇到需要將資料去重計數的工作。例如:

表A,列col

A

C

A

B

C

D

A

B

結果就是一共出現4個不同的字母A、B、C、D

即結果為4

大體上我們可以選擇count(distinct col)的方法和group+count的方法。

分別為:

select count(distinct col) from A;

select count(1) from (select 1 from A group by col) alias;

兩中方法實現有什麼不同呢?

其實上述兩中方法分別是在運算和儲存上的權衡。

distinct需要將col列中的全部內容都儲存在一個記憶體中,可以理解為一個hash結構,keycol的值,最後計算hash結構中有多少個key即可得到結果。

很明顯,需要將所有不同的值都存起來。記憶體消耗可能較大。

group by的方式是先將col排序。而資料庫中的group一般使用sort的方法,即資料庫會先對col進行排序。而排序的基本理論是,時間複雜為nlogn,空間為1.,然後只要單純的計數就可以了。優點是空間複雜度小,缺點是要進行一次排序,執行時間會較長。

兩中方法各有優劣,在使用的時候,我們需要根據實際情況進行取捨。

具體情況可參考如下法則

資料分佈 去重方式 原因
離散 group distinct空間佔用較大,在時間複雜度允許的情況下,group 可以發揮空間複雜度優勢
集中 distinct distinct空間佔用較小,可以發揮時間複雜度優勢

兩個極端:

1.資料列的所有資料都一樣,即去重計數的結果為1時,用distinct最佳

2.如果資料列唯一,沒有相同數值,用group 最好

當然,在group by時,某些資料庫產品會根據資料列的情況智慧地選擇是使用排序去重還是hash去重,例如postgresql。當然,我們可以根據實際情況對執行計劃進行人工的干預,而這不是這裡要討論的話題了。


使用EXISTS替換DISTINCT

當查詢中包含的表之間有一對多的關係時,避免在SELECT子句中使用DISTICT,可以使用EXISTS替換。

--查詢emp表中目前所有員工都在哪些部門工作(包括部門編號和部門名稱)
--使用DISTINCT(低效)
SELECT DISTINCT d.deptno, d.dname FROM dept d, emp e WHERE d.deptno = e.deptno;

在這裡插入圖片描述

--使用EXISTS(高效)

select d.deptno, d.dname from dept d
where exists (select 1 from emp e where e.deptno = d.deptno);

在這裡插入圖片描述


使用exists+使用exists代替in+使用exists代替distinct

使用exists代替in

  1. exists只檢查行的存在性,in 檢查實際的值,所以exists的效能比in
    驗證
select * from emp 
where deptno in(select  deptno from dept where   loc='NEW YORK');

select * from emp e
where  exists(select 1 from dept d where d.deptno=e.deptno and loc='NEW YORK');

在這裡插入圖片描述

使用exists代替distinct

  1. exists只檢查行的存在性,distinct用於禁止重複行的顯示,而且distinct在禁止重複行的顯示前需要排序檢索的行,所以exists的效能比distinct

驗證

select distinct e.deptno,d.dname  from emp e,dept d
where e.deptno=d.deptno;

select  d.deptno,d.dname  from dept d
where exists(select 1 from emp e where e.deptno=d.deptno);

這裡寫圖片描述