1. 程式人生 > >Oracle優化器、優化模式、表的連線方式(Hash Join、Nested Loop、Sort Merge Join)

Oracle優化器、優化模式、表的連線方式(Hash Join、Nested Loop、Sort Merge Join)

查詢優化器

Oracle的查詢優化器(QO)分為兩種:
1. RBO:Ruled-Based Optimization, 基於規則的優化器;
2. CBO :Cost-Based Optimization, 基於代價的優化器;

從 Oracle 10g開始,Oracle已放棄RBO,但為了相容性,仍然可以設定RBO.

優化模式

優化模式分為:
FIRST_ROWS: 儘可能快的先返回幾行資料;
FIRST_ROWS_n:包含FIRST_ROWS_1000、FIRST_ROWS_100、FIRST_ROWS_10、FIRST_ROWS_1 和上面類似,只是制定了具體的行數;
ALL_ROWS

: 以最快的方式返回所有的記錄,這是預設的優化模式;

SQL> show parameter optimizer_mode;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode                       string      ALL_ROWS
SQL>           

表的連線方式

Hash Join

: 把小表的資料存到記憶體中,並建立HashTable,然後用大表的每條記錄來匹配HashTable。兩個表關聯的欄位無需建立索引,查詢Hash要比查詢索引快;

A join in which the database uses the smaller of two tables or data sources to build a hash table in memory. The database scans the larger table, probing the hash table for the addresses of the matching rows in the smaller table.

為了提高效率,需要設定hash_area_size 足夠大,如果Hash表佔用的記憶體超過了hash_area_size的大小,就會分頁到臨時表空間,這會帶來一定的效能損耗。

什麼時候optimizer 會選擇使用 Hash Joins呢?

一、優化器自動選擇
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:
1)A large amount of data must be joined.
2)A large fraction of a small table must be joined.

二、人工指定
可以通過use_hash來強制使用Hash Join.

SQL> select /*+use_hash(amy_emp,amy_dept*/ count(*) from amy_emp,amy_dept where amy_emp.deptno=amy_dept.deptno;

a) 這種方法是在oracle7後來引入的,使用了比較先進的連線理論,一般來說,其效率應該好於其它2種連線,但是這種連線只能用在CBO優化器中,而且需要設定合適的hash_area_size引數,才能取得較好的效能。
b) 在2個較大的row source之間連線時會取得相對較好的效率,在一個row source較小時則能取得更好的效率。
c) 只能用於等值連線中。

Nested Loop :

外表驅動內表,外表的每一行都會在內表中進行匹配。與Hash Join不同的是,沒有使用內表來生成HashTable,因此內表最好有索引。

It is important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table’s access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.

a) 如果driving row source(外部表)比較小,並且在inner row source(內部表)上有唯一索引,或有高選擇性非唯一索引時,使用這種方法可以得到較好的效率。
b) NESTED LOOPS有其它連線方法沒有的的一個優點是:可以先返回已經連線的行,而不必等待所有的連線操作處理完才返回資料,這可以實現快速的響應時間。

Sort Merge Join :

Sort merge joins can join rows from two independent sources. Hash joins generally perform better than sort merge joins. However, sort merge joins can perform better than hash joins if both of the following conditions exist:
1.The row sources are sorted already.
2.A sort operation does not have to be done.

However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.

Sort merge joins are useful when the join condition between two tables is an inequality condition such as <, <=, >, or >=.

Sort merge joins perform better than nested loop joins for large data sets.

You cannot use hash joins unless there is an equality condition.

In a merge join, there is no concept of a driving table. The join consists of two steps:
1.Sort join operation: Both the inputs are sorted on the join key.
2.Merge join operation: The sorted lists are merged together.

If the input is sorted by the join column, then a sort join operation is not performed for that row source.
However, a sort merge join always creates a positionable sort buffer for the right side of the join so that it can seek back to the last match in the case where duplicate join key values come out of the left side of the join.

a) 對於非等值連線,這種連線方式的效率是比較高的。
b) 如果在關聯的列上都有索引,效果更好。
c) 對於將2個較大的row source做連線,該連線方法比NL連線要好一些。
d) 但是如果sort merge返回的row source過大,則又會導致使用過多的rowid在表中查詢資料時,資料庫效能下降,因為過多的I/O。

三種連線方式的區別和選擇

  1. Hash Join 不一定就比其它兩種快,Hash Join 只能用於等值連線中;
  2. 如果使用FIRST_ROWS等提示,會強制CBO選擇NESTED LOOP;