1. 程式人生 > >資料量對where in語句的索引影響

資料量對where in語句的索引影響

開發十年,就只剩下這套架構體系了! >>>   

我們經常在論壇和麵試中遇到這個問題,mysql中,where in會不會用到索引?

為了徹底搞明白這個問題,做了一些測試,發現記錄數大小對是否命中索引有影響,我們來看一看。

使用的mysql版本是5.7,資料庫引擎為預設的innoDB,索引型別是預設的B+樹索引,用explain執行計劃確認是否命中索引。

我們建立一個表

create table staffs(
    id int primary key auto_increment,
    name varchar(24) not null default '' comment '姓名',
    age int not null default 0 comment '年齡',
    pos varchar(20) not null default '' comment '職位',
    add_time timestamp not null default current_timestamp comment '入職時間'
)charset utf8 comment '員工記錄表';

1, 我們測試第一種情況,資料量少的情況

先插入三條資料

insert into staffs(name,age,pos,add_time) values('z3',22,'manager',now());
insert into staffs(name,age,pos,add_time) values('July',23,'dev',now());
insert into staffs(name,age,pos,add_time) values('2000',23,'dev',now());

1.1 對單列索引的影響,以name為例

alter table staffs add index idx_staffs_name(name);
mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys   | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_name | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

可以看到,沒有命中索引,行數為3,server層對儲存引擎返回的資料做過濾之後剩餘66.67%,也就是說,儲存引擎返回了3條記錄,mysql的server層過濾掉1條,剩下2條,filtered的值為66.67%. (explain詳見之前的博文: https://my.oschina.net/u/3412738/blog/2244825

1.2 對聯合索引的影響

準備索引

alter table staffs drop index idx_staffs_name;
alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

1.2.1 對聯合索引最左欄位的影響

mysql> explain select * from staffs where name = 'z3';
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.04 sec)

可以看到,用 = 查詢時,由於最左原則,用到了索引,而用in查詢時,沒有用到索引。

1.2.2 對聯合索引中間欄位的影響

mysql> explain select * from staffs where name = 'z3' and age = 22;
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from staffs where name = 'z3' and age in (22, 23);
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

同樣的,當使用 = 查詢時,依次使用了聯合索引,而第二個欄位用 in 查詢時,連第一個欄位都被拖累,沒有使用索引。

 

2,資料量大的情況

為了快速插入大量資料並建立索引,我們先把原來的那張表drop掉,再建一張一樣的表,不帶任何索引,這樣就不會耗費更新索引的時間。這邊用儲存過程插入。

DELIMITER $$
    CREATE PROCEDURE test_insert()
    BEGIN
        declare i int;
        set i = 1 ;
        WHILE (i < 10000) DO
            INSERT INTO staffs(`name`,`age`,`pos`) VALUES(CONCAT('a', i), FLOOR(20 + RAND() * (100 - i + 1)),'dev');	 
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_insert();
Query OK, 0 rows affected (8 min 7.84 sec)

9999條資料耗時8分多鐘,還是有點慢的。

 

2.1 對單列索引的影響,以name為例

按照之前的動作,建立索引(命令和上面一樣,為了節約篇幅,這裡就不放出來了,下同),再查詢。

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys   | key             | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_name | idx_staffs_name | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引,2條記錄,準確率100%.

1.2 對聯合索引的影響

同樣先刪除單列索引,建立聯合索引。

1.2.1 對聯合索引最左欄位的影響

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引。

mysql> explain select * from staffs where name in ('a1', 'a2000') and age = 23;
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

in欄位後面再加條件也可以命中。

1.2.2 對聯合索引中間欄位的影響

mysql> explain select * from staffs where name = 'a1' and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)
mysql> explain select * from staffs where name in ('a1', 'a2000') and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    4 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

對中間欄位也沒有影響,同樣可以命中索引。

 

3, 總結

3.1 當資料量少時,會按照聯合索引的順序依次使用索引,反而不會使用單列索引,可能的原因是,mysql認為資料量太小,直接走全表查詢,全表掃描反而更快。

3.2 當資料量大時,單列索引一定會使用。聯合索引也會按順序依次使用。

3.3 當然這裡in條件裡面的數值長度不大,如果是一個很長陣列,導致返回的結果佔全表記錄數量較大時,應該也不會使用索引而走全表查詢。

3.4 這裡還沒有測試,當in條件裡面是一個子查詢時的情況。同時,這裡沒有對5.7以下版本做測試。這裡引用一段這位博主的話

如果是 5.5 之前的版本確實不會走索引的,在 5.5 之後的版本,MySQL 做了優化。MySQL 在 2010 年釋出 5.5 版本中,優化器對 in 操作符可以自動完成優化,針對建立了索引的列可以使用索引,沒有索引的列還是會走全表掃描。

比如,5.5 之前的版本(以下都是 5.5 以前的版本)。select * from a where id in (select id from b); 這條 sql 語句它的執行計劃其實並不是先查詢出 b 表的所有 id,然後再與 a 表的 id 進行比較。mysql 會把 in 子查詢轉換成 exists 相關子查詢,所以它實際等同於這條 sql 語句:select * from a where exists(select * from b where b.id=a.id);

而 exists 相關子查詢的執行原理是:迴圈取出 a 表的每一條記錄與 b 表進行比較,比較的條件是 a.id=b.id。看 a 表的每條記錄的 id 是否在 b 表存在,如果存在就行返回 a 表的這條記