1. 程式人生 > >從多表連接後的select count(*)看待SQL優化

從多表連接後的select count(*)看待SQL優化

create itl sele aggregate null 表連接 相等 eat back

從多表連接後的select count(*)看待SQL優化

一朋友問我,以下這SQL能直接改寫成select count(*) from a嗎?

SELECT COUNT(*)
FROM a
     LEFT JOIN b ON a.a1 = b.b1
     LEFT JOIN c ON b.b1 = c.c1

廢話不多說,直接上實驗。

1. 準備數據

創建測試表a,b,c,並插入數據,a有重復數據,b是唯一數據,c是唯一數據,d有重復數據。

1) 創建a表
create table a (a1 int);
insert into a select 1;
insert into
a select 2; insert into a select 3; insert into a select 1; insert into a select 2; insert into a select 3; insert into a values(null); insert into a values(null); insert into a values(null); insert into a values(null);

2)創建b表
create table b (b1 int);
insert into b select 1;
insert into b select
2; insert into b select 3; insert into b select 4; insert into b select 5;


3)創建c表
create table c (c1 int);
insert into c select 7;
insert into c select 8;
insert into c select 9;
insert into c values(null);
insert into c values(null);


4)創建d表
create table d (d1 int);
insert into d select 1
; insert into d select 1; insert into d select 1; insert into d select 1; insert into d select 1; insert into d select 1;

2. 數據查看

a表b表c表d表
1 1 7 1
2 2 8 1
3 3 9 1
1 4 null 1
2 5 null 1
3 1
null
null
null
null

3. SQL示例

3.1 a表連接b表再連接c表(N:1:1的關系)

a表連接列有重復數據,b,c兩表的連接列都是唯一數據

SELECT COUNT(*)
FROM a
     LEFT JOIN b ON a.a1 = b.b1
     LEFT JOIN c ON b.b1 = c.c1

+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set (0.00 sec)

返回的10條數據

此時SQL只返回a表的數據,那麽這時候SQL可以改寫成

mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
|       10 |
+----------+
1 row in set (0.00 sec)

3.2 b表連接a表再連接c表(1:N:1的關系)

SELECT count(*)
FROM b
     LEFT JOIN a ON b.b1 = a.a1
     LEFT JOIN c ON a.a1 = c.c1

+----------+
| count(*) |
+----------+
|        8 |
+----------+
1 row in set (0.00 sec)

原本b表是5條數據,left join後變為8條,此時就不能改寫成上述形式了,我們來看下,具體數據是什麽。

+------+------+------+
| b1   | a1   | c1   |
+------+------+------+
|    1 |    1 | NULL |
|    2 |    2 | NULL |
|    3 |    3 | NULL |
|    1 |    1 | NULL |
|    2 |    2 | NULL |
|    3 |    3 | NULL |
|    4 | NULL | NULL |
|    5 | NULL | NULL |
+------+------+------+
8 rows in set (0.00 sec)

可以看到a表的重復數據,在b表重復展現了,c表與a表連接,沒有相等的數據(null不等於null)所以c1列展現都為null值。

這時候此SQL可以等價於以下:

SELECT count(*)
FROM b
     LEFT JOIN a ON b.b1 = a.a1;

+----------+
| count(*) |
+----------+
|        8 |
+----------+
1 row in set (0.00 sec)

3.3 a表與d表相連接(N:N關系)

SELECT *
FROM a
     LEFT JOIN d ON a.a1 =d.d1;

+------+------+
| a1   | d1   |
+------+------+
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    1 |    1 |
|    2 | NULL |
|    3 | NULL |
|    2 | NULL |
|    3 | NULL |
| NULL | NULL |
| NULL | NULL |
| NULL | NULL |
| NULL | NULL |
+------+------+
20 rows in set (0.00 sec)

可以看a表a1列數據組成是 a表2個1 * b表 6個1 = 12個1,再加上原本a1列的數據8條,總共20條數據。

4. 總結

從以上實驗可以延伸到,如果連接列基數很低,此時left join就相當於笛卡兒積。。

所以在做SQL優化時候,尤其需要關註連接列的基數,與表與表之間的關系。

從多表連接後的select count(*)看待SQL優化