1. 程式人生 > >一次mysql無緣無故的掛掉,使用innodb_force_recovery修復

一次mysql無緣無故的掛掉,使用innodb_force_recovery修復

一次mysql無緣無故的掛掉

最近遇到了一個比較奇怪的問題,在大家都在睡午覺的時候,突然手機響了起來,我為了不吵醒其他人拿起了手機看了看監控信息,我去,居然是數據庫down了,這是一臺運行很久的數據庫服務器,當我登進去服務器的時候,嘗試重啟mysql,但是報(Starting MySQL..... ERROR! The server quit without updating PID file (/usr/local/mysql/data/BigData_ZT_PY_92.pid).)錯誤,然後就去看錯誤日誌和其他排查方法,就在排查期間突然又來監控告警,提示xxx主機 has just been restarted,我嘗試ping一下主機結果ping不通,我當場就懵逼了,服務器無端端的就自己重啟了,而且後面連續重啟了幾次。最後聯系機房人員,幫忙連接顯示屏查看什麽情況。

經過一番折騰,機器終於起來了,我們就開始排查了。查看錯誤日誌發現


InnoDB: End of page dump

2018-05-23 21:10:08 7f6786710700 InnoDB: uncompressed page, stored checksum in field1 2222046951, calculated checksums for field1: crc32 2624418990, innodb 12552

80539, none 3735928559, stored checksum in field2 1914065653, calculated checksums for field2: crc32 2624418990, innodb 3045085343, none 3735928559, page LSN 555

2748030571, low 4 bytes of LSN at page end 2748030571, page number (if stored to page already) 84692, space id (if created with >= MySQL-4.1.1 and stored alread

y) 2618

InnoDB: Page may be an index page where index id is 8005

InnoDB: Database page corruption on disk or a failed

InnoDB: file read of page 84692.

InnoDB: You may have to recover from a backup.

InnoDB: It is also possible that your operating

InnoDB: system has corrupted its own file cache

InnoDB: and rebooting your computer removes the

InnoDB: error.

InnoDB: If the corrupt page is an index page

InnoDB: you can also try to fix the corruption

InnoDB: by dumping, dropping, and reimporting

InnoDB: the corrupt table. You can use CHECK

InnoDB: TABLE to scan your table for corruption.

InnoDB: See also http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

InnoDB: Ending processing because of a corrupt database page.

2018-05-23 21:10:08 7f6786710700 InnoDB: Assertion failure in thread 140082613913344 in file buf0buf.cc line 4201

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

13:10:08 UTC - mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.


key_buffer_size=8388608

read_buffer_size=131072

max_used_connections=0

max_threads=1024

thread_count=0

connection_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 415416 K bytes of memory

Hope that's ok; if not, decrease some variables in the equation.


Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0 thread_stack 0x40000

63 /usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0x8f339c]

/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x364)[0x66e3e4]

/lib64/libpthread.so.0(+0xf5e0)[0x7f6b9c5b45e0]

/lib64/libc.so.6(gsignal+0x37)[0x7f6b9b3ba1f7]

/lib64/libc.so.6(abort+0x148)[0x7f6b9b3bb8e8]

/usr/local/mysql/bin/mysqld[0xa9c5c5]

/usr/local/mysql/bin/mysqld[0xadecd6]

/usr/local/mysql/bin/mysqld[0xa400c8]

/lib64/libpthread.so.0(+0x7e25)[0x7f6b9c5ace25]

/lib64/libc.so.6(clone+0x6d)[0x7f6b9b47d34d]

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

180523 21:10:09 mysqld_safe mysqld from pid file /usr/local/mysql/data/BigData_ZT_PY_92.pid ended

180523 21:44:59 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data

2018-05-23 21:44:59 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).


以上可以看出點信息就是回滾信息的時候出錯了,後來去查了一下資料發現,可能是二進制文件被損壞了。

後來決定使用強制InnoDB恢復,



這裏解析下用法:

[mysqld]


innodb_force_recovery = 1


警告

只有在緊急情況下將innodb_force_recovery設為大於0的值,你才能啟動InnoDB並轉儲表。在進行此操作之前,確保你有數據庫的備份副本,以備需要重建它。4及以上的值可以永久破壞數據文件。只有在數據庫的獨立物理副本的成功地測試了設置,才能在生產服務器實例使用4及以上的innodb_force_recovery設置。當強制InnoDB恢復,你應該總是以innodb_force_recovery=1啟動,且僅在需要時增加值。

innodb_force_recovery默認為0(沒有強制恢復的正常啟動)。對於innodb_force_recovery允許的非零值是1至6。較大值包括較小值的功能。例如,為3的值包括所有的值1和2的功能。


如果你能以innodb_force_recovery為3或更低值轉儲你的表,那麽你是比較安全的,只有在損壞的個人頁的一些數據會丟失。4或更大的值被認為是危險的,因為數據文件可以被永久地損壞。值6被認為是嚴重的,數據庫頁被留在一個陳舊的狀態,這反過來又可能帶給B-trees和其它數據庫結構更多的損壞。

作為一個安全措施,InnoDB 在innodb_force_recovery大於0時阻止INSERT,UPDATE或DELETE操作。對於MySQL5.6.15,將innodb_force_recovery設為4或更高會讓InnoDB處於只讀模式。

1 (SRV_FORCE_IGNORE_CORRUPT)

即使服務器檢測到損壞的頁仍讓它運行。試圖使SELECT* FROM tbl_name跳過損壞的索引記錄和頁,這樣有助於轉儲表。


2 (SRV_FORCE_NO_BACKGROUND)

阻止主線程和任何清除線程的運行。如果崩潰會在清除操作中發生,該恢復值會阻止它。


3 (SRV_FORCE_NO_TRX_UNDO)

不要在崩潰恢復後運行事務回滾。


4 (SRV_FORCE_NO_IBUF_MERGE)

阻止插入緩沖合並操作。如果它們會導致崩潰,不要做這些。不計算表統計。這個值可以永久損壞數據文件。使用這個值後,準備號刪除並重建所有輔助索引。在MySQL5.6.15中,設置InnoDB為只讀。


5 (SRV_FORCE_NO_UNDO_LOG_SCAN)

在啟動數據庫時不查看撤消日誌:InnoDB將即使未完成的事務也作為已提交。這個值可以永久損壞數據文件。在MySQL5.6.15中,設置InnoDB為只讀。


6 (SRV_FORCE_NO_LOG_REDO)

不要通過恢復對重做日誌進行前滾。這個值可能永久損壞數據文件。數據庫頁被留在一個陳舊的狀態,這反過來又可能帶給B-trees和其它數據庫結構更多的損壞。在MySQL5.6.15中,設置InnoDB為只讀。


你可以從表中SELECT來轉儲它們。innodb_force_recovery的值為3或更低,你可以DROP或CREATE表。在MySQL 5.6.27中,DROP TABLE還受大於3的innodb_force_recovery值支持。


如果你知道一個給定表在回滾造成崩潰,你可以將其刪除。如果遇到所造成失敗的大規模導入的失控回滾或ALTER TABLE,你可以殺掉mysqld進程,並設置innodb_force_recovery為3使數據庫啟動而不回滾,然後DROP導致失控回滾的表。


如果表數據中的損壞阻止你轉儲整個表的內容,帶ORDER BY primary_key DESC子句的查詢能夠轉儲損壞部分後的表的部分。


如果一個高innodb_force_recovery值需要啟動InnoDB,可能有被破壞的數據結構,可能導致復雜查詢(含有WHERE,ORDER BY或其他子句的查詢)失敗。在這種情況下,你可能只能運行基本的SELECT* FROM t查詢。




然後啟動下數據庫:

[root@databases ~]# /etc/init.d/mysql start


啟動數據庫以後進去數據庫show slave status\G;看到從庫沒起來,然後把/etc/my.cnf文件中innodb_force_recovery = 1註釋叼重啟數據庫就沒問題了。


後來排查可能是服務器硬件發生故障,從而使數據庫被停止,也可能順壞了二進制文件。

而且在/etc/my.cnf配置文件裏面設置了

innodb_flush_log_at_trx_commit = 2 # 主庫為1(當IO過載改為2), 從庫為2;

假如設置為1時io性能會很差,所以這臺主機只能設置為2.


一次mysql無緣無故的掛掉,使用innodb_force_recovery修復