1. 程式人生 > >Drop Table對MySQL的效能影響分析

Drop Table對MySQL的效能影響分析

【問題描述】

最近碰到有臺MySQL例項出現了MySQL服務短暫hang死,表現為瞬間的併發執行緒上升,連線數暴增。
排查Error Log檔案中有page_cleaner超時的資訊,引起我們的關注:

2019-08-24T23:47:09.361836+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 24915ms. The settings might not be optimal. (flushed=182 and evicted=0, during the time.)
2019-08-24T23:47:16.211740+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4849ms. The settings might not be optimal. (flushed=240 and evicted=0, during the time.)
2019-08-24T23:47:23.362286+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6151ms. The settings might not be optimal. (flushed=215 and evicted=0, during the time.)

【問題分析】

1、 error log中page_cleaner資訊是如何產生的

通過原始碼storage/innobase/buf/buf0flu.cc可以看出,當滿足curr_time > next_loop_time + 3000條件(大於4秒)時,會列印上面資訊到error log中。next_loop_time為1000ms,即1秒鐘做一次重新整理頁的操作。

 if (ret_sleep == OS_SYNC_TIME_EXCEEDED) {
                     ulint curr_time = ut_time_ms();

                     if (curr_time > next_loop_time + 3000) {
                            if (warn_count == 0) {
                                   ib::info() << "page_cleaner: 1000ms"
                                          " intended loop took "
                                          << 1000 + curr_time
                                             - next_loop_time
                                          << "ms. The settings might not"
                                          " be optimal. (flushed="
                                          << n_flushed_last
                                          << " and evicted="
                                          << n_evicted
                                          << ", during the time.)";
                                   if (warn_interval > 300) {
                                          warn_interval = 600;
                                   } else {
                                          warn_interval *= 2;
                                   }

                                   warn_count = warn_interval;
                            } else {
                                   --warn_count;
                            }
                     } else {
                            /* reset counter */
                            warn_interval = 1;
                            warn_count = 0;
                     }

                     next_loop_time = curr_time + 1000;
                     n_flushed_last = n_evicted = 0;
              }

後半部分(flushed=182 and evicted=0, during the time.)日誌,對應n_flushed_last和n_evicted兩個變數,這兩個變數來自於下面2個變數。

 n_evicted += n_flushed_lru;
n_flushed_last += n_flushed_list;

從原始碼註釋中可以看出,n_flushed_lru表示從LRU list尾部重新整理的頁數,也就是日誌中如evicted=0指標的所表示的值。
n_flushed_list:這個是從flush_list重新整理列表中重新整理的頁數,也就是髒頁數,日誌中flushed=182的值。

/**
Wait until all flush requests are finished.
@param n_flushed_lru    number of pages flushed from the end of the LRU list.
@param n_flushed_list   number of pages flushed from the end of the
            flush_list.
@return         true if all flush_list flushing batch were success. */
static
bool
pc_wait_finished(
    ulint*  n_flushed_lru,
    ulint*  n_flushed_list)

從pc_wait_finished函式可以看出page_cleaner執行緒包含了LRU list和flush_list兩部分重新整理,而且需要等待兩部分都完成重新整理。

2、MySQL5.7.4引入了多執行緒page cleaner,但由於LRU列表重新整理和髒頁列表重新整理仍然耦合在一起,在遇到高負載,或是熱資料的buffer pool instance時,仍存在效能問題。

1) LRU List刷髒在先,Flush list的刷髒在後,但是是互斥的。也就是說在進Flush list刷髒的時候,LRU list不能繼續去刷髒,必須等到下一個迴圈週期才能進行。
2) 另外一個問題是,刷髒的時候,page cleaner coodinator會等待所有的page cleaner執行緒完成之後才會繼續響應刷髒請求。這帶來的問題就是如果某個buffer pool instance比較熱的話,page cleaner就不能及時進行響應。
Percona版本對LRU list刷髒做了很多優化。

3、分析問題例項的binlog日誌,可以看到從2019-08-24 23:46:15到2019-08-24 23:47:25之間沒有記錄任何日誌,說明這段時間內mysql服務無法正常處理請求,短暫hang住了

mysqlbinlog -vvv binlog --start-datetime='2019-08-24 23:46:15' --stop-datetime='2019-08-24 23:47:25'|less

從計數器指標可以看出期間併發執行緒積壓,QPS處理能力下降,稍後MySQL服務恢復,積壓的請求集中釋放,導致併發連線進一步上升

4、從Innodb_buffer_pool_pages_misc和Innodb_buffer_pool_pages_free指標來看,在問題時間段buffer pool在1分鐘內集中釋放了約16*(546893-310868)=3776400,3.7G可用記憶體。
可能期間LRU list的mutex資源鎖定,導致page cleaner執行緒在LRU list重新整理時阻塞,從而表現出page_cleaner執行緒執行時間過長。

innodb_buffer_pool_pages_misc與adaptive hash index有關,下面是官網的描述

• Innodb_buffer_pool_pages_misc
The number of pages in the InnoDB buffer pool that are busy because they have been allocated for administrative overhead, such as row locks or the adaptive hash index. This value can also be calculated as Innodb_buffer_pool_pages_total − Innodb_buffer_pool_pages_free − Innodb_buffer_pool_pages_data. When using compressed tables, Innodb_buffer_pool_pages_misc may report an out-of-bounds value (Bug #59550).

5、為什麼AHI短時間內會釋放大量的記憶體呢,通過慢查詢日誌我們排查到期間有drop table的操作,但drop的表容量約50G,並不是很大,drop table為什麼會導致MySQL服務短暫hang死,它對伺服器效能會產生多大的影響,我們做了模擬測試。

【測試重現過程】

為了進一步驗證,我們在測試環境下模擬測試了drop table對高併發MySQL效能的影響。
1、 使用sysbench工具做壓測,首先在測試環境下建立了8張2億條記錄的表,單表容量48G
2、 開啟innodb_adaptive_hash_index,用olap模板壓測1個小時,填充目標8張表所對應的AHI記憶體
3、 再次開啟壓測執行緒只對sbtest1表做訪問,觀察MySQL的訪問情況
4、 新建會話執行drop table test.sbtest2,看到drop一張48G的表消耗了64.84秒

5、用自定義指令碼檢測每秒Innodb_buffer_pool_pages_misc和Innodb_buffer_pool_pages_free指標的變化情況,看到在drop table期間Innodb_buffer_pool_pages_misc大量釋放,Innodb_buffer_pool_pages_free值同時增長,釋放和增加的內容總量基本一致,如下圖所示

6、drop table期間,MySQL處於hang死狀態,QPS長時間跌0

7、errorlog中也重現了page_cleaner的日誌資訊

至此復現了問題現象。

【為什麼MySQL會短暫hang死】

1、 壓測期間,抓取了pstack,show engine innodb status,以及performance_schema下events_waits_summary_global_by_event_name表中相關mutex等現場資訊
2、 在SEMAPHORES相關資訊中,可以看到hang死期間大量Thread請求S-lock,被同一個執行緒140037411514112持有的鎖所阻塞,時間持續了64秒

--Thread 140037475792640 has waited at row0purge.cc line 862 for 64.00 seconds the semaphore:
S-lock on RW-latch at 0x966f6e38 created in file dict0dict.cc line 1183
a writer (thread id 140037411514112) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /mysql-5.7.23/storage/innobase/row/row0mysql.cc line 4253
--Thread 140037563102976 has waited at srv0srv.cc line 1982 for 57.00 seconds the semaphore:
X-lock on RW-latch at 0x966f6e38 created in file dict0dict.cc line 1183
a writer (thread id 140037411514112) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /mysql-5.7.23/storage/innobase/row/row0mysql.cc line 4253

3、通過ROW OPERATIONS相關資訊,可以看到MySQL的Main Thread也被同一個執行緒140037411514112阻塞,狀態處於enforcing dict cache limit狀態

3 queries inside InnoDB, 0 queries in queue
17 read views open inside InnoDB
Process ID=39257, Main thread ID=140037563102976, state: enforcing dict cache limit
Number of rows inserted 1870023456, updated 44052478, deleted 22022445, read 9301843315
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

4、可以看到執行緒140037411514112執行的SQL就是drop table test.sbtest2語句,說明drop table期間持有的鎖,阻塞了Main Thread及其他執行緒

---TRANSACTION 44734025, ACTIVE 64 sec dropping table——
10 lock struct(s), heap size 1136, 7 row lock(s), undo log entries 1
MySQL thread id 440836, OS thread handle 140037411514112, query id 475074428 localhost root checking permissions
drop table test.sbtest2

5、下面是抓到的drop table 的簡化後的呼叫棧資訊,對比可以看出,64秒的時間,drop table 都在執行函式btr_search_drop_page_hash_index,清空對應的AHI記錄資訊

Thread 32 (Thread 0x7f5d002b2700 (LWP 8156)):
#0 ha_remove_all_nodes_to_page
#1 btr_search_drop_page_hash_index
#2 btr_search_drop_page_hash_when_freed
#3 fseg_free_extent
#4 fseg_free_step
#5 btr_free_but_not_root
#6 btr_free_if_exists
#7 dict_drop_index_tree
#8 row_upd_clust_step
#9 row_upd
#10 row_upd_step
#11 que_thr_step
#12 que_run_threads_low
#13 que_run_threads
#14 que_eval_sql
#15 row_drop_table_for_mysql
#16 ha_innobase::delete_table
#17 ha_delete_table
#18 mysql_rm_table_no_locks
#19 mysql_rm_table
#20 mysql_execute_command
#21 mysql_parse
#22 dispatch_command
#23 do_command
#24 handle_connection
#25 pfs_spawn_thread
#26 start_thread ()
#27 clone ()

6、通過程式碼我們可以看到drop table呼叫row_drop_table_for_mysql函式時,在row_mysql_lock_data_dictionary(trx);位置對元資料加了排它鎖

row_drop_table_for_mysql(
const char* name,
trx_t* trx,
bool drop_db,
bool nonatomic,
dict_table_t* handler)
{
dberr_t err;
dict_foreign_t* foreign;
dict_table_t* table = NULL;
char* filepath = NULL;
char* tablename = NULL;
bool locked_dictionary = false;
pars_info_t* info = NULL;
mem_heap_t* heap = NULL;
bool is_intrinsic_temp_table = false;
DBUG_ENTER("row_drop_table_for_mysql");
DBUG_PRINT("row_drop_table_for_mysql", ("table: '%s'", name));
ut_a(name != NULL);
/* Serialize data dictionary operations with dictionary mutex:
no deadlocks can occur then in these operations /
trx->op_info = "dropping table";
if (handler != NULL && dict_table_is_intrinsic(handler)) {
table = handler;
is_intrinsic_temp_table = true;
}
if (table == NULL) {
if (trx->dict_operation_lock_mode != RW_X_LATCH) {
/
Prevent foreign key checks etc. while we are
dropping the table */
row_mysql_lock_data_dictionary(trx);
locked_dictionary = true;
nonatomic = true;
}

7、以Main Thread為例,在呼叫srv_master_evict_from_table_cache函式獲取X-lock on RW-latch時被阻塞

/********************************************************************//**
Make room in the table cache by evicting an unused table.
@return number of tables evicted. /
static
ulint
srv_master_evict_from_table_cache(
/
==============================/
ulint pct_check) /
!< in: max percent to check */
{
ulint n_tables_evicted = 0;
rw_lock_x_lock(dict_operation_lock);
dict_mutex_enter_for_mysql();
n_tables_evicted = dict_make_room_in_cache(
innobase_get_table_cache_size(), pct_check);
dict_mutex_exit_for_mysql();
rw_lock_x_unlock(dict_operation_lock);
return(n_tables_evicted);
}

8、檢視dict_operation_lock的註釋,create drop table操作時需要X鎖,而一些其他後臺執行緒,比如Main Thread檢查dict cache時,也需要獲取dict_operation_lock的X鎖,因此被阻塞

/** @brief the data dictionary rw-latch protecting dict_sys
table create, drop, etc. reserve this in X-mode; implicit or
backround operations purge, rollback, foreign key checks reserve this
in S-mode; we cannot trust that MySQL protects implicit or background
operations a table drop since MySQL does not know of them; therefore
we need this; NOTE: a transaction which reserves this must keep book
on the mode in trx_t::dict_operation_lock_mode /
rw_lock_t
dict_operation_lock;

9、期間使用者執行緒由於獲取不到鎖而處於掛起狀態,當無法立刻獲得鎖時,會呼叫row_mysql_handle_errors將錯誤碼傳到上層進行處理

/****************************************************************//
Handles user errors and lock waits detected by the database engine.
@return true if it was a lock wait and we should continue running the
query thread and in that case the thr is ALREADY in the running state.
/
bool
row_mysql_handle_errors

下面是簡化後的使用者執行緒呼叫棧資訊:

Thread 29 (Thread 0x7f5d001ef700 (LWP 8159)):
#0  pthread_cond_wait@@GLIBC_2.3.2
#1  wait
#2  os_event::wait_low
#3  os_event_wait_low
#4  lock_wait_suspend_thread
#5  row_mysql_handle_errors
#6  row_search_mvcc
#7  ha_innobase::index_read
#8  handler::ha_index_read_map
#9  handler::read_range_first
#10 handler::multi_range_read_next
#11 QUICK_RANGE_SELECT::get_next
#12 rr_quick
#13 mysql_update
#14 Sql_cmd_update::try_single_table_update
#15 Sql_cmd_update::execute
#16 mysql_execute_command
#17 mysql_parse
#18 dispatch_command
#19 do_command
#20 handle_connection
#21 pfs_spawn_thread
#22 start_thread
#23 clone

10、page_cleaner後臺執行緒沒有抓到明顯的阻塞關係,只看到如下正常的呼叫棧資訊

Thread 55 (Thread 0x7f5c7fe15700 (LWP 39287)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1 os_event::timed_wait
#2 os_event::wait_time_low
#3 os_event_wait_time_low
#4 pc_sleep_if_needed
#5 buf_flush_page_cleaner_coordinator
#6 start_thread
#7 clone
Thread 54 (Thread 0x7f5c7f614700 (LWP 39288)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
#1 wait
#2 os_event::wait_low
#3 os_event_wait_low
#4 buf_flush_page_cleaner_worker
#5 start_thread
#6 clone

【結論&解決思路】

drop table引起的MySQL 短暫hang死的問題,是由於drop 一張使用AHI空間較大的表時,呼叫執行AHI的清理動作,會消耗較長時間,執行期間長時間持有dict_operation_lock的X鎖,阻塞了其他後臺執行緒和使用者執行緒;
drop table執行結束鎖釋放,MySQL積壓的使用者執行緒集中執行,出現了併發執行緒和連線數瞬間上升的現象。
規避問題的方法,可以考慮在drop table前關閉AHI