MySQL -- 索引選擇
- 優化器的重要職責: 選擇索引
- 目的是尋找 最優 的執行方案
- 大多數時候,優化器都能找到正確的索引
- 在資料庫裡面,決定 執行代價 的因素
- 掃描行數 – 本文關注點
- 是否使用 臨時表
- 是否 排序
- MySQL在真正開始執行語句之前,並不能精確地知道滿足條件的記錄有多少
- 只能根據 統計資訊 ( 索引的區分度 )來 估算 記錄數
- 基數越大(不同的值越多),索引的區分度越好
- 統計資訊中索引的基數是 不準確 的
mysql> SHOW INDEX FROM t; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+ | t|0 | PRIMARY|1 | id| A|100256 |NULL |NULL || BTREE||| YES| | t|1 | a|1 | a| A|100512 |NULL |NULL | YES| BTREE||| YES| | t|1 | b|1 | b| A|100512 |NULL |NULL | YES| BTREE||| YES| +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
基數統計
- 方法: 取樣統計
- 基數:InnoDB預設選擇 N 個數據頁,統計這些頁上的不同值,得到一個 平均值 ,然後再乘以 索引的頁面數
- 當資料表 變更的資料行 超過 1/M 時,會 自動觸發 索引的取樣統計
- 索引統計資訊的儲存,引數控制
innodb_stats_persistent
- ON:持久化儲存統計資訊,N=20,M=10
- OFF:統計資訊只會儲存在記憶體中,N=8,M=16
- 手動觸發索引的取樣統計:
ANALYZE TABLE t;
- 使用場景:當explain預估的rows與實際情況差距較大時
mysql> SHOW VARIABLES LIKE '%innodb_stats_persistent%'; +--------------------------------------+-------+ | Variable_name| Value | +--------------------------------------+-------+ | innodb_stats_persistent| ON| | innodb_stats_persistent_sample_pages | 20| +--------------------------------------+-------+
表初始化
建表
CREATE TABLE `t` ( `id` INT(11) NOT NULL, `a` INT(11) DEFAULT NULL, `b` INT(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`) ) ENGINE=InnoDB;
表初始化
# 儲存過程 DELIMITER // CREATE PROCEDURE idata() BEGIN DECLARE i INT; SET i=1; WHILE (i <= 100000) DO INSERT INTO t VALUES (i, i, i); SET i=i+1; END WHILE; END// DELIMITER ; # 呼叫儲存過程 CALL idata();
索引樹
查詢
常規查詢
選擇索引 a
,預估的掃描行數為 10001
mysql> EXPLAIN SELECT * FROM t WHERE a BETWEEN 10000 AND 20000; +----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ | id | select_type | table | partitions | type| possible_keys | key| key_len | ref| rows| filtered | Extra| +----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |1 | SIMPLE| t| NULL| range | a| a| 5| NULL | 10001 |100.00 | Using index condition | +----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+
索引選擇異常
# 返回空集合 mysql> EXPLAIN SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+ | id | select_type | table | partitions | type| possible_keys | key| key_len | ref| rows| filtered | Extra| +----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+ |1 | SIMPLE| t| NULL| range | a,b| b| 5| NULL | 50128 |1.00 | Using index condition; Using where | +----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+ mysql> SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1; Empty set (0.07 sec) # Time: 2019-01-30T11:32:31.335272Z # User@Host: root[root] @ localhost []Id:8 # Query_time: 0.046896Lock_time: 0.000141 Rows_sent: 0Rows_examined: 50001 SET timestamp=1548847951; SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
- 如果使用索引
a
進行查詢- 掃描索引
a
的前1000個值,取得對應的id,再到 聚簇索引 上查出每一行,然後根據欄位b來過濾,需要掃描1000行
- 掃描索引
- 如果使用索引
b
進行查詢- 掃描索引
b
的最後50001個值,與上面的過程類似,需要掃描50001行 - 優化器的異常選擇,預估的掃描行數依然 不準確
- 之前優化器選擇索引
b
,是認為使用索引b能夠 避免排序 ,所以即使掃描行數多,也認為代價較小-
Extra
沒有Using filesort
-
- 掃描索引
force index
程式碼不優雅
mysql> EXPLAIN SELECT * FROM t FORCE INDEX(a) WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type| possible_keys | key| key_len | ref| rows | filtered | Extra| +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ |1 | SIMPLE| t| NULL| range | a| a| 5| NULL | 1000 |11.11 | Using index condition; Using where; Using filesort | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ mysql> SELECT * FROM t FORCE INDEX(a) WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1; Empty set (0.00 sec) # Time: 2019-01-30T11:32:45.938128Z # User@Host: root[root] @ localhost []Id:8 # Query_time: 0.001304Lock_time: 0.000148 Rows_sent: 0Rows_examined: 1000 SET timestamp=1548847965; SELECT * FROM t FORCE INDEX(a) WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
order by b,a
不通用
mysql> EXPLAIN SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b,a LIMIT 1; +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type| possible_keys | key| key_len | ref| rows | filtered | Extra| +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ |1 | SIMPLE| t| NULL| range | a,b| a| 5| NULL | 1000 |50.00 | Using index condition; Using where; Using filesort | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ mysql> SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b,a LIMIT 1; Empty set (0.01 sec) # Time: 2019-01-30T13:53:18.233163Z # User@Host: root[root] @ localhost []Id:8 # Query_time: 0.000609Lock_time: 0.000191 Rows_sent: 1Rows_examined: 0 SET timestamp=1548856398; EXPLAIN SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b,a LIMIT 1;
-
order by b,a
要求按照b,a排序,那 掃描行數 成為了影響優化器 決策的主要條件 ,此時會選擇只需掃描1000行的索引a
- 但這並非通用優化手段,只是恰好
order by b limit 1
和order by b,a limit 1
都是返回b中最小的一行,語義一致而已
limit 100
不通用
mysql> EXPLAIN SELECT * FROM (SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 100) alias LIMIT 1; +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | id | select_type | table| partitions | type| possible_keys | key| key_len | ref| rows | filtered | Extra| +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+ |1 | PRIMARY| <derived2> | NULL| ALL| NULL| NULL | NULL| NULL |100 |100.00 | NULL| |2 | DERIVED| t| NULL| range | a,b| a| 5| NULL | 1000 |50.00 | Using index condition; Using where; Using filesort | +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
limit 100
:根據資料特徵來 誘導 優化器,讓優化器意識到使用索引 b
的 代價很高 ,同樣不具有通用性
其他辦法
- 新建一個更合適的索引
- 刪除誤用的索引
參考資料
《MySQL實戰45講》
轉載請註明出處:http://zhongmingmao.me/2019/01/30/mysql-index-select/
訪問原文「MySQL -- 索引選擇」獲取最佳閱讀體驗並參與討論