淺談Oracle中exists與in的執行效率問題

阿新 • • 發佈：2018-11-13

淺談Oracle中exists與in的執行效率問題

原創 Oracle 作者：迷倪小魏時間：2017-11-29 13:43:30 5149 1

in 是把外表和內表作hash join，而exists是對外表作loop，每次loop再對內表進行查詢。一般大家都認為exists比in語句的效率要高，這種說法其實是不準確的，這個是要區分環境的。

exists對外表用loop逐條查詢，每次查詢都會檢視exists的條件語句，當 exists裡的條件語句能夠返回記錄行時(無論記錄行是的多少，只要能返回)，條件就為真，返回當前loop到的這條記錄，反之如果exists裡的條件語句不能返回記錄行，則當前loop到的這條記錄被丟棄，exists的條件就像一個bool條件，當能返回結果集則為true，不能返回結果集則為 false。

例如：

select * from user where exists (select 1);

對user表的記錄逐條取出，由於子條件中的select 1永遠能返回記錄行，那麼user表的所有記錄都將被加入結果集，所以與 select * from user;是一樣的

又如下

select * from user where exists (select * from user where userId = 0);

可以知道對user表進行loop時，檢查條件語句(select * from user where userId = 0),由於userId永遠不為0，所以條件語句永遠返回空集，條件永遠為false，那麼user表的所有記錄都將被丟棄

not exists與exists相反，也就是當exists條件有結果集返回時，loop到的記錄將被丟棄，否則將loop到的記錄加入結果集

總的來說，如果A表有n條記錄，那麼exists查詢就是將這n條記錄逐條取出，然後判斷n遍exists條件

in查詢相當於多個or條件的疊加，這個比較好理解，比如下面的查詢

select * from user where userId in (1, 2, 3);

等效於

select * from user where userId = 1 or userId = 2 or userId = 3;

not in與in相反，如下

select * from user where userId not in (1, 2, 3);

等效於

select * from user where userId != 1 and userId != 2 and userId != 3;

總的來說，in查詢就是先將子查詢條件的記錄全都查出來，假設結果集為B，共有m條記錄，然後在將子查詢條件的結果集分解成m個，再進行m次查詢

值得一提的是，in查詢的子條件返回結果必須只有一個欄位，例如

select * from user where userId in (select id from B);

而不能是

select * from user where userId in (select id, age from B);

而exists就沒有這個限制

下面來考慮exists和in的效能：

對於以上兩種情況，in是在記憶體裡遍歷比較，而exists需要查詢資料庫，所以當B表資料量較大時，exists效率優於in。

考慮如下SQL語句

select * from A where exists (select * from B where B.id = A.id);

select * from A where A.id in (select id from B);

1、select * from A where exists (select * from B where B.id = A.id);

exists()會執行A.length次，它並不快取exists()結果集，因為exists()結果集的內容並不重要，重要的是其內查詢語句的結果集空或者非空，空則返回false，非空則返回true。
它的查詢過程類似於以下過程：

for ($i = 0; $i < count(A); $i++) {

　　$a = get_record(A, $i); #從A表逐條獲取記錄

　　if (B.id = $a[id]) #如果子條件成立

　　　　$result[] = $a;

}

return $result;

當B表比A表資料大時適合使用exists()，因為它沒有那麼多遍歷操作，只需要再執行一次查詢就行。
如：A表有10000條記錄，B表有1000000條記錄，那麼exists()會執行10000次去判斷A表中的id是否與B表中的id相等。
如：A表有10000條記錄，B表有100000000條記錄，那麼exists()還是執行10000次，因為它只執行A.length次，可見B表資料越多，越適合exists()發揮效果。
再如：A表有10000條記錄，B表有100條記錄，那麼exists()還是執行10000次，還不如使用in()遍歷10000*100次，因為in()是在記憶體裡遍歷比較，而exists()需要查詢資料庫，我們都知道查詢資料庫所消耗的效能更高，而記憶體比較很快。

結論：exists()適合B表比A表資料大的情況

2、select * from A where id in (select id from B);

in()只執行一次，它查出B表中的所有id欄位並快取起來。之後，檢查A表的id是否與B表中的id相等，如果相等則將A表的記錄加入結果集中，直到遍歷完A表的所有記錄。

它的查詢過程類似於以下過程：

Array A=(select * from A);

Array B=(select id from B);

for(int i=0;i<a.length;i++) { </a.length;i++) { <>

for(int j=0;j<b.length;j++) { </b.length;j++) { <>

if(A[i].id==B[j].id) {

resultSet.add(A[i]);

break;

}

return resultSet;

可以看出，當B表資料較大時不適合使用in()，因為它會B表資料全部遍歷一次
如：A表有10000條記錄，B表有1000000條記錄，那麼最多有可能遍歷10000*1000000次，效率很差。
再如：A表有10000條記錄，B表有100條記錄，那麼最多有可能遍歷10000*100次，遍歷次數大大減少，效率大大提升。

結論：in()適合B表比A表資料小的情況

當A表資料與B表資料一樣大時，in與exists效率差不多，可任選一個使用。

在插入記錄前，需要檢查這條記錄是否已經存在，只有當記錄不存在時才執行插入操作，可以通過使用 EXISTS 條件句防止插入重複記錄。
insert into A (name,age) select name,age from B where not exists (select 1 from A where A.id=B.id);

EXISTS與IN的使用效率的問題，通常情況下采用exists要比in效率高，因為IN不走索引。但要看實際情況具體使用：IN適合於外表大而內表小的情況；EXISTS適合於外表小而內表大的情況。

下面再看not exists 和 not in

1、select * from A where not exists (select * from B where B.id = A.id);

2、select * from A where A.id not in (select id from B);

看查詢1，還是和上面一樣，用了B的索引；而對於查詢2，可以轉化成如下語句

select * from A where A.id != 1 and A.id != 2 and A.id != 3;

可以知道not in是個範圍查詢，這種!=的範圍查詢無法使用任何索引,等於說A表的每條記錄，都要在B表裡遍歷一次，檢視B表裡是否存在這條記錄

not in 和not exists：如果查詢語句使用了not in 那麼內外表都進行全表掃描，沒有用到索引；而not extsts 的子查詢依然能用到表上的索引。所以無論那個表大，用not exists都比not in要快，故not exists比not in效率高。

in 與 =的區別

select name from student where name in ('zhang','wang','li','zhao');

與

select name from student where name='zhang' or name='li' or name='wang' or name='zhao'

的結果是相同的。

在我們一般的觀點中，總是認為使用EXISTS(或NOT EXISTS)通常將提高查詢的效率，所以一般推薦使用exists來代替in。但實際情況是不是這個樣子呢？我們分別在兩種不同的優化器模式下用實際的例子來看一下：

[email protected]>create table wjq1 as select * from dba_objects;

Table created.

[email protected]>create table wjq2 as select * from dba_tables ;

Table created.

[email protected]>create index idx_object_name on wjq1(object_name);

Index created.

[email protected]>create index idx_table_name on wjq2(table_name);

Index created.

[email protected]>select count(*) from wjq1;

COUNT(*)

----------

86976

[email protected]>select count(*) from wjq2;

COUNT(*)

----------

2868

一、內查詢結果集比較小，而外查詢較大的時候的情況

1、在CBO模式下：

[email protected]>select * from wjq1 where object_name in (select table_name from wjq2 where table_name like 'M%');

815 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1638414738

---------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1238 | 270K| 354 (1)| 00:00:05 |

|* 1 | HASH JOIN RIGHT SEMI| | 1238 | 270K| 354 (1)| 00:00:05 |

|* 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 772 | 13124 | 7 (0)| 00:00:01 |

|* 3 | TABLE ACCESS FULL | WJQ1 | 5503 | 1112K| 347 (1)| 00:00:05 |

---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("OBJECT_NAME"="TABLE_NAME")

2 - access("TABLE_NAME" LIKE 'M%')

filter("TABLE_NAME" LIKE 'M%')

3 - filter("OBJECT_NAME" LIKE 'M%')

Note

-----

- dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

17 recursive calls

0 db block gets

1462 consistent gets

1256 physical reads

0 redo size

46140 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

815 rows processed

[email protected]>select * from wjq1 where exists (select 1 from wjq2 where wjq1.object_name=wjq2.table_name and wjq2.table_name like 'M%');

815 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1638414738

---------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1238 | 270K| 354 (1)| 00:00:05 |

|* 1 | HASH JOIN RIGHT SEMI| | 1238 | 270K| 354 (1)| 00:00:05 |

|* 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 772 | 13124 | 7 (0)| 00:00:01 |

|* 3 | TABLE ACCESS FULL | WJQ1 | 5503 | 1112K| 347 (1)| 00:00:05 |

---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("WJQ1"."OBJECT_NAME"="WJQ2"."TABLE_NAME")

2 - access("WJQ2"."TABLE_NAME" LIKE 'M%')

filter("WJQ2"."TABLE_NAME" LIKE 'M%')

3 - filter("WJQ1"."OBJECT_NAME" LIKE 'M%')

Note

-----

- dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

13 recursive calls

0 db block gets

1462 consistent gets

1242 physical reads

0 redo size

46140 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

815 rows processed

通過上面執行計劃對比發現：
在CBO模式下，我們可以看到這兩者的執行計劃完全相同，統計資料也相同。

我們再來看一下RBO模式下的情況，這種情況相對複雜一些。

2、在RBO模式下：

[email protected]>select /*+ rule*/ * from wjq1 where object_name in (select table_name from wjq2 where table_name like 'M%');

815 rows selected.

Elapsed: 00:00:00.01

Execution Plan

----------------------------------------------------------

Plan hash value: 144941173

--------------------------------------------------------

| Id | Operation | Name |

--------------------------------------------------------

| 0 | SELECT STATEMENT | |

| 1 | NESTED LOOPS | |

| 2 | NESTED LOOPS | |

| 3 | VIEW | VW_NSO_1 |

| 4 | SORT UNIQUE | |

|* 5 | INDEX RANGE SCAN | IDX_TABLE_NAME |

|* 6 | INDEX RANGE SCAN | IDX_OBJECT_NAME |

| 7 | TABLE ACCESS BY INDEX ROWID| WJQ1 |

--------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

5 - access("TABLE_NAME" LIKE 'M%')

filter("TABLE_NAME" LIKE 'M%')

6 - access("OBJECT_NAME"="TABLE_NAME")

Note

-----

- rule based optimizer used (consider using cbo)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

698 consistent gets

0 physical reads

0 redo size

55187 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

815 rows processed

[email protected]>select /*+ rule*/ * from wjq1 where exists (select 1 from wjq2 where wjq1.object_name=wjq2.table_name and wjq2.table_name like 'M%');

815 rows selected.

Elapsed: 00:00:00.15

Execution Plan

----------------------------------------------------------

Plan hash value: 3545670754

---------------------------------------------

| Id | Operation | Name |

---------------------------------------------

| 0 | SELECT STATEMENT | |

|* 1 | FILTER | |

| 2 | TABLE ACCESS FULL| WJQ1 |

|* 3 | INDEX RANGE SCAN | IDX_TABLE_NAME |

---------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter( EXISTS (SELECT 0 FROM "WJQ2" "WJQ2" WHERE

"WJQ2"."TABLE_NAME"=:B1 AND "WJQ2"."TABLE_NAME" LIKE 'M%'))

3 - access("WJQ2"."TABLE_NAME"=:B1)

filter("WJQ2"."TABLE_NAME" LIKE 'M%')

Note

-----

- rule based optimizer used (consider using cbo)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

91002 consistent gets

1242 physical reads

0 redo size

46140 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

815 rows processed

通過上面兩個執行計劃的對比發現：
  在這裡，我們可以看到實際上，使用in效率比exists效率更高。我們可以這樣來理解這種情況：
  對於in，RBO優化器選擇的記憶體查詢的結果作為驅動表來進行nest loops連線，所以當記憶體查詢的結果集比較小的時候，這個in的效率還是比較高的。
  對於exists，RBO優化器則是利用外查詢表的全表掃描結果集過濾內查詢的結果集，當外查詢的表比較大的時候，相對效率比較低。

二、內查詢結果集比較大，而外查詢較小的時候的情況

1、在CBO模式下：

[email protected]>select * from wjq2 where table_name in (select object_name from wjq1 where object_name like 'S%');

278 rows selected.

Elapsed: 00:00:00.03

Execution Plan

----------------------------------------------------------

Plan hash value: 1807911610

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 278 | 164K| 55 (0)| 00:00:01 |

|* 1 | HASH JOIN SEMI | | 278 | 164K| 55 (0)| 00:00:01 |

|* 2 | TABLE ACCESS FULL| WJQ2 | 278 | 146K| 31 (0)| 00:00:01 |

|* 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 4435 | 285K| 24 (0)| 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("TABLE_NAME"="OBJECT_NAME")

2 - filter("TABLE_NAME" LIKE 'S%')

3 - access("OBJECT_NAME" LIKE 'S%')

filter("OBJECT_NAME" LIKE 'S%')

Note

-----

- dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

67 recursive calls

0 db block gets

403 consistent gets

446 physical reads

0 redo size

22852 bytes sent via SQL*Net to client

721 bytes received via SQL*Net from client

20 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

278 rows processed

[email protected]>

[email protected]>select * from wjq2 where exists (select 1 from wjq1 where wjq1.object_name=wjq2.table_name and wjq1.object_name like 'S%');

278 rows selected.

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

Plan hash value: 1807911610

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 278 | 164K| 55 (0)| 00:00:01 |

|* 1 | HASH JOIN SEMI | | 278 | 164K| 55 (0)| 00:00:01 |

|* 2 | TABLE ACCESS FULL| WJQ2 | 278 | 146K| 31 (0)| 00:00:01 |

|* 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 4435 | 285K| 24 (0)| 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("WJQ1"."OBJECT_NAME"="WJQ2"."TABLE_NAME")

2 - filter("WJQ2"."TABLE_NAME" LIKE 'S%')

3 - access("WJQ1"."OBJECT_NAME" LIKE 'S%')

filter("WJQ1"."OBJECT_NAME" LIKE 'S%')

Note

-----

- dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

13 recursive calls

0 db block gets

295 consistent gets

2 physical reads

0 redo size

22852 bytes sent via SQL*Net to client

721 bytes received via SQL*Net from client

20 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

278 rows processed

通過上面兩個執行計劃的對比發現：
雖然他們的執行計劃相同，但是使用exists比使用in的物理讀和邏輯讀明顯小很多，所以使用exists效率更高一下

2、在RBO模式下：

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/31015730/viewspace-2147932/，如需轉載，請註明出處，否則將追究法律責任

淺談Oracle中exists與in的執行效率問題

淺談Oracle中exists與in的執行效率問題

一、內查詢結果集比較小，而外查詢較大的時候的情況

1、在CBO模式下：

2、在RBO模式下：

二、內查詢結果集比較大，而外查詢較小的時候的情況

1、在CBO模式下：

2、在RBO模式下：

淺談Oracle中exists與in的執行效率問題

SQL中EXISTS與IN的效率問題

DB2中exists與in的效率對比 — 5秒與21分鐘的差距

Sql中EXISTS與IN的使用及效率

oracle--exists與in的效率探討 ( 轉存)

淺談angularJS中src與ng-src屬性的區別

淺談HTTP中Get與Post的區別（轉）

淺談python中的“ ==” 與“ is”

淺析MySQL中exists與in的使用（寫的非常好）

mysql中EXISTS與IN用法比較

淺談HTTP中Get與Post的區別

淺談HTTP中Get與Post的區別，轉載

淺談HTTP中Get與Post的區別(轉)

淺談jQuery中find()與filter()兩種方法聯絡與區別

MySQL中exists與in的使用以及查詢效率比較

【轉載】淺談HTTP中Get與Post的區別

淺談MFC中BitBlt與StretchDIBits的差別

淺談c++中類與類之間的組合關係

淺談php中global與$GLOBALS[' ']的區別

淺談C++中qsort與sort的使用方法與區別

淺談Oracle中exists與in的執行效率問題

淺談Oracle中exists與in的執行效率問題

一、內查詢結果集比較小，而外查詢較大的時候的情況

1、在CBO模式下：

2、在RBO模式下：

二、內查詢結果集比較大，而外查詢較小的時候的情況

1、在CBO模式下：

2、在RBO模式下：

相關推薦