1. 程式人生 > >MySQL效能優化之filesort

MySQL效能優化之filesort

MySQL效能優化之filesort

0. 問題

因為專案需求,需要使用一個group + join操作,但是因為表的資料量特別大,導致執行SQL的過程極其緩慢。在azkaban中呼叫大概需要100min。
在這裡插入圖片描述裡面的SQL怎麼寫的呢?主要如下:

select 
t_1.*
,dd.department_type as department_type
,'${DATE_KEY}' as date_key
,current_timestamp 		as ETL_Created_Date
,current_timestamp 		as ETL_Modified_Date
,'${PROCESS_ID}'
as process_id ,'mnt_fact' as rec_source ,dd.department_name as bns_area_name from (select 0 as is_qr ,5 as qualifier_id ··· ,sum(mf.bare_order_qt) as bare_order_qt ,sum(mf.bare_logic_bns_amount) as bare_logic_bns_amount ,sum(mf.bare_sell_devices) as bare_sell_devices ,count(mf.
device_code) as device_qt ,count(distinct mf.shop_id) as shop_qt from mnt_fact mf where date_key <= '${DATE_KEY}' and date_key >= date_format(date_sub('${DATE_KEY}', interval day('${DATE_KEY}')-1 day),'%Y%m%d') group by mf.bns_area1_id ,mf.device_type_class )t_1 inner join datamart.vw_dim_department_cur dd on
t_1.bns_area_id = dd.department_id

使用explain 檢視執行計劃如下:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1564460
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: dim_department
   partitions: NULL
         type: ref
possible_keys: idx_department_id
          key: idx_department_id
      key_len: 8
          ref: t_1.bns_area_id
         rows: 5
     filtered: 10.00
        Extra: Using index condition
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: mf
   partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 14082963
     filtered: 11.11
        Extra: Using where; Using filesort
3 rows in set, 1 warning (0.00 sec)

可以看到這裡的第三行,出現了Using filesort,這個就是在group by時,我們就會出現檔案排序的問題。下面我詳細解釋一下這個filesort

1. filesort 是什麼?

2. 原因

2. 解決辦法

建立索引,避免使用filesort。
檢視原表索引如下:

mysql> show index from mnt_fact;
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name                                | Seq_in_index | Column_name       | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| mnt_fact |          0 | PRIMARY                                 |            1 | ID                | A         |    15902350 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          0 | PRIMARY                                 |            2 | date_key          | A         |    15978838 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          1 | idx_device                              |            1 | device_code       | A         |    15914067 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_dept_key                            |            1 | department_key    | A         |       60584 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_emp_shop_devtype                    |            1 | employee_code     | A         |       66114 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          1 | idx_emp_shop_devtype                    |            2 | shop_id           | A         |     8483195 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_emp_shop_devtype                    |            3 | device_type_class | A         |     7341482 |     NULL | NULL   | YES  | BTREE      |         |               |

可以看到是沒有mf.bns_area1_idmf.device_type_class 的索引,雖然這個索引的Cardinality是非常的小,但是因為其可以在group by中起到非常重要的作用,所以在執行上述這個SQL時,仍然會按照索引group。
針對group by 欄位新增如下索引:

| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            1 | bns_area6_id      | A         |       21778 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            2 | device_type_class | A         |       66270 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            3 | employee_code     | A         |      176908 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            4 | shop_id           | A         |     7485426 |     NULL | NULL   | YES  | BTREE      |         |               |

執行計劃如下:

*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: mf
   partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
         type: index
possible_keys: idx_device_type_class_bns4,idx_bns1_bns2_bns3_bns4_bns5_bns6_deviceTypeClass
          key: idx_device_type_class_bns4
      key_len: 214
          ref: NULL
         rows: 14103728
     filtered: 11.11
        Extra: Using where
3 rows in set, 1 warning (0.00 sec)

可以看到這次沒有執行filesort。然後執行SQL會從原先的100min 縮減到10 min。