記一次揪心的MySQL資料恢復過程

阿新 • • 發佈：2019-02-10

先說下背景，公司其中一個專案所有服務都部署在客戶的機房內，機房較小，沒有UPS。其中一個MySQL例項(單機，無主從，windows server 2008，MySQL5.6.19)存放大量的日誌資料，每天幾十G的資料，定期清除(儲存大概四個月的資料)，由於硬碟空間不夠，所以沒有定期的備份。機房突然斷電，啟動MySQL server，當時沒有注意錯誤日誌，但是訪問其中一個表時，server自動掛掉，這才意識到資料庫可能因為突然掉電導致無法正常啟動，然後檢視錯誤日誌：

2017-10-12 18:05:22 bd0 InnoDB: Error: page 756 log sequence number 786184012016 

InnoDB: is in the future! Current system log sequence number 786183991367.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.

根據錯誤提示：資料檔案的LSN比redo log的LSN要大，當系統嘗試使用Redo Log去修復資料頁面的時候，發現Redo Log LSN比資料頁面還小，所以導致錯誤。資料頁的LSN在一般情況下，都是小於Redo Log的，因為在事物提交或按照 innodb_trx_commit 設定的方式提交時，先將事物順序寫入Redo Log ，然後後臺執行緒按照 max_prt_dirty_page 引數設定的比例重新整理或當系統檢測到當10秒內系統會執行重新整理髒頁操作，所以，資料頁的LSN正常情況下永遠會比Redo Log 的LSN 小。

因此通過設定innodb_force_recovery大於0 ，重啟資料庫服務，匯出重要的資料，重建資料庫。
首先了解下innodb_force_recovery 設定為不同值對啟動資料庫服務過程的影響,大的數字包含前面所有數字的影響。
- 1 (SRV_FORCE_IGNORE_CORRUPT): 忽略檢查到的corrupt頁。
- 2 (SRV_FORCE_NO_BACKGROUND): 阻止主執行緒的執行，如主執行緒需要執行full purge操作，會導致crash。
- 3 (SRV_FORCE_NO_TRX_UNDO): 不執行事務回滾操作。
- 4 (SRV_FORCE_NO_IBUF_MERGE): 不執行插入緩衝的合併操作。
- 5 (SRV_FORCE_NO_UNDO_LOG_SCAN):不檢視重做日誌，InnoDB儲存引擎會將未提交的事務視為已提交。

當設定引數值大於0後，可以對錶進行select,create,drop操作,但insert,update或者delete這類操作是不允許的。當然即使innodb_force_recovery>0 ，你也可以DROP或CREATE表。

以此設定innodb_force_recovery為1到6，啟動服務使用mysqldump備份資料庫，每次都提示同樣的錯誤：

mysqldump -uuser -ppasswd --skip-ext
ended_insert --hex-blob -y -n -t --routines --events --triggers --databases db_name >> "d:/db_name.sql"
Warning: Using a password on the command line interface can be insecure.
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping
 table `tb_name` at row: 50548

然後執行:

select id from tb_name limit 50548,1;

同樣提示：

ERROR 2013 (HY000): Lost connection to MySQL server during query

但是執行如下能正常獲取資料：

select id from tb_name limit 50547,1;

檢視錯誤日誌，分析是BLOB欄位超過768位元組的存在溢位頁上的部分資料已經損壞：

End of page dump
2017-10-12 18:16:41 258 InnoDB: uncompressed page, stored checksum in field1 3939709550, calculated checksums for field1: crc32 3646189668, innodb 3963718570, none 25210765039, stored checksum in field2 0, calculated checksums for field2: crc32 3646189668, innodb 1246618578, none 3735928559, page LSN 0 1201607135, low 4 bytes of LSN at page end 0, page number (if stored to page already) 125076, space id (if created with >= MySQL-4.1.1 and stored already) 77
InnoDB: Page may be a BLOB page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 125076.

現在無法通過MySQL服務進行正常的邏輯備份恢復了，只能通過工具對idb檔案進行記錄的解析來獲取記錄。

下面是使用undrop-for-innodb工具恢復表資料的過程（只可以在64位linux系統下執行，只能恢復MySQL5.6版本下的資料）；
下載地址：
https://github.com/twindb/undrop-for-innodb

TwinDB Data Recovery Toolkit is a set of tools that operate with MySQL files at low level and allow to recover InnoDB databases after different failure scenarios.
The toolkit is also known as UnDrop for InnoDB, which is more accurate name because the toolkit works with InnoDB tables.
The tool recovers data when backups are not available. It supports recovery from following failures:

 - A table or database was dropped.
 - InnoDB table space corruption.
 - Hard disk failure.
 - File system corruption.
 - Records were deleted from a table.
 - A table was truncated.
 - InnoDB files were accidentally deleted.
 - A table was dropped and created empty one.

注:percona公司開源的一款工具percona-data-recovery-tool-for-innodb，使用和undrop-for-innodb幾乎一樣；

安裝undrop-for-innodb

解壓master.zip
進入undrop-for-innodb-master，編譯make，編譯後文件中生成兩個工具：c_parser和stream_parser

./stream_parser使用引數
[[email protected] undrop-for-innodb-master]# ./stream_parser -h
Usage: ./stream_parser -f <innodb_datafile> [-T N:M] [-s size] [-t size] [-V|-g]
  Where:
    -h         - Print this help
    -V or -g   - Print debug information
    -s size    - Amount of memory used for disk cache (allowed examples 1G 10M). Default 100M
    -T         - retrieves only pages with index id = NM (N - high word, M - low word of id)
    -t size    - Size of InnoDB tablespace to scan. Use it only if the parser can't determine it by himself.

c_parser使用引數
[[email protected] undrop-for-innodb-master]# ./c_parser -h
Error: Usage: ./c_parser -4|-5|-6 [-dDV] -f <InnoDB page or dir> -t table.sql [-T N:M] [-b <external pages directory>]
  Where
    -f <InnoDB page(s)> -- InnoDB page or directory with pages(all pages should have same index_id)
    -t <table.sql> -- CREATE statement of a table
    -o <file> -- Save dump in this file. Otherwise print to stdout
    -l <file> -- Save SQL statements in this file. Otherwise print to stderr
    -h  -- Print this help
    -d  -- Process only those pages which potentially could have deleted records (default = NO)
    -D  -- Recover deleted rows only (default = NO)
    -U  -- Recover UNdeleted rows only (default = YES)
    -V  -- Verbose mode (lots of debug information)
    -4  -- innodb_datafile is in REDUNDANT format
    -5  -- innodb_datafile is in COMPACT format
    -6  -- innodb_datafile is in MySQL 5.6 format
    -T  -- retrieves only pages with index id = NM (N - high word, M - low word of id)
    -b <dir> -- Directory where external pages can be found. Usually it is pages-XXX/FIL_PAGE_TYPE_BLOB/
    -i <file> -- Read external pages at their offsets from <file>.
    -p prefix -- Use prefix for a directory name in LOAD DATA INFILE command

恢復

建立frs_person_type.sql，將要恢復表的表定義寫入frs_person_type.sql中
首先拆分共享表空間ibdata1檔案(目的是為了獲取每個表中主鍵ID)：

[root@localhost undrop-for-innodb-master]#./stream_parser -f /path/mysql/data/ibdata1

此時目錄中建立了pages-ibdata資料夾，裡面包括了對各種表的索引資訊等；

然後拆分db_name.ibd：

[root@localhost undrop-for-innodb-master]# ./stream_parser -f /path/mysql/data/test_db/frs_person_type.ibd

此時目錄中建立裡pages-frs_person_type.ibd資料夾

恢復innodb目錄
我們需要知道表test_db.frs_person_type的PRIMARY索引的index_id。檢視更多的InnoDB字典。現在我們將得到test_db.frs_person_type的index_id：

[[email protected] undrop-for-innodb-master]# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page -t dictionary/SYS_TABLES.sql |grep type
00000000060C    D8000001B30110  SYS_TABLES  "test_db/frs\_card\_type"   35  2   1   0   80  ""  21
00000000060F    DB000001780110  SYS_TABLES  " test_db/frs\_person\_type"    36  7   1   0   80  ""  22
00000000060C    D8000001B30110  SYS_TABLES  " test_db/frs\_card\_type"  35  2   1   0   80  ""  21

[[email protected] undrop-for-innodb-master]# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page -t dictionary/SYS_INDEXES.sql |grep 36
00000000060F    DB0000017801D9  SYS_INDEXES 36  64  "PRIMARY"   1   3   22  3
000000000615    610000017C01F9  SYS_INDEXES 36  65  "c\_id" 1   0   22  4
00000000061D    670000018001F9  SYS_INDEXES 36  66  "type"  1   0   22  5

所以，test_db.frs_person_type表的PRIMARY索引的index_id是64

從表的PRIMARY索引恢復記錄
c_parser讀取InnoDB頁面，將它們與給定的表結構進行匹配，並以製表符分隔的值格式轉儲記錄。與InnoDB相對，當c_parser命中損壞的區域時，它會跳過它並繼續閱讀頁面。我們從index_id 64讀取記錄，這是根據字典的PRIMARY索引。
“`
[[email protected] undrop-for-innodb-master]# time ./c_parser -5f pages-frs_person_type.ibd/FIL_PAGE_INDEX/0000000000000064.page -t frs_person_type.sql > frs_person_type 2> frs_person_type.sql

上述生成的frs_person_type為資料記錄檔案，frs_person_type.sql為匯入frs_person_type檔案資料的SQL語句；

 - 帶有BLOB欄位的表恢復
 如果表具有BLOB，TEXT或類似的大欄位，一些值可能儲存在外部頁面中。 外部頁面通常在stream_parser結果中的目錄FIL_PAGE_TYPE_BLOB中：
 ```
[[email protected] undrop-for-innodb-master]# ll pages-frs_grab.ibd/
total 0
drwxr-xr-x 2 root root 42 Oct 20 14:01 FIL_PAGE_INDEX
drwxr-xr-x 2 root root 10 Oct 20 14:01 FIL_PAGE_TYPE_BLOB

如果使用上一步語句恢復的時候，frs_person_type.sql檔案中會出現大量BLOB頁找不到的情況：

— #####CannotOpen_./0000000000002021.page;
— print_field_value_with_external(): open(): No such file or directory

這就需要使用引數-b指定溢位頁位置：

time ./c_parser -5f pages-frs_grab/FIL_PAGE_INDEX/0000000000002067.page -b  pages-frs_grab.ibd/FIL_PAGE_ -t frs_grab.sql > dfrs_grab 2> frs_grab.sql

在一些罕見的情況下，page-frs_grab.ibd / FIL_PAGE_TYPE_BLOB/為空，因為在某些MySQL版本上，外部頁面的型別為FIL_PAGE_INDEX。這是一種意想不到的行為，但有一個解決方法。
To read external pages from a file (e.g. ibdata1) option -i is introduced:

-i <file> -- Read external pages at their offsets from <file>.

還原

直接進入建立表，然後資料庫執行：

source frs_person_type.sql

如果frs_person_type位置被移動，需要修改frs_person_type.sql；

完畢
最後，一定要備份。備份。備份。

記一次揪心的MySQL資料恢復過程

安裝undrop-for-innodb

恢復

還原

記一次揪心的MySQL資料恢復過程

MySQL-記一次備份失敗的排查過程

記一次Mybatis+Oracle, 資料多且日期間隔大時, 查詢非常慢解決過程

記一次線上mysql主從架構異常的恢復經歷

記一次改造react腳手架的過程

記錄一次郵件容災恢復過程

記一次線上MySQL數據庫死鎖問題

記一次驚險的檔案恢復經歷Eclipse saved my day

記一次hadoop大資料叢集生產事故

記一次nmap掃描資訊收集過程

WPScan使用完整教程之記一次對WordPress的滲透過程

記一次安裝mysql-devel帶來的系統問題

一次驚險的資料恢復

記一次裝mysql服務引發的血案

記一次完整的效能測試過程

記一次線上問題的排查過程

記一次MongoDB故障排查的過程

記一次spring5原始碼完整編譯過程

記一次SQL Server的清理過程

記一次誤刪資料

記一次揪心的MySQL資料恢復過程

安裝undrop-for-innodb

恢復

還原

相關推薦