背景:最近採購了一批新的伺服器,底層的儲存裝置的預設physical sector size從原有的 512B 改為了 4K。
裝完系統以後,在做資料庫物理備份恢復時xtrabackup報了這麼一個錯。但是同樣的備份在512B sector size的老系統上卻可以恢復。
報錯如下:
InnoDB: Error: tried to read 2048 bytes at offset 0 0.
InnoDB: Was only able to read 0.
140304 18:48:12 InnoDB: Operating system error number 22 in a file operation.
InnoDB: Error number 22 means 'Invalid argument'.
InnoDB: Some operating system error numbers are described at
InnoDB:http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
InnoDB: File operation call: 'read'.
InnoDB: Cannot continue operation.
innobackupex-1.5.1: Error:
innobackupex-1.5.1: ibbackup failed at /usr/bin/innobackupex-1.5.1 line 386.我們先不討論原因,先看一下解決方案:http://bazaar.launchpad.net/~akopytov/percona-xtrabackup/bug1190779-2.0/revision/561#src/xtrabackup.cc
升級到2.0.7以上的xtrabackup即可。
1. 什麼是Block(Sector)
為什麼同樣的程式在512B的block size 和在4K的block size上的行為結果不一樣呢?
我們先來看一下什麼是device block (sector) size:block(又叫sector) 是一個塊裝置的最小讀寫單位。也就是說對於一個512B block size的裝置。即使上層呼叫只需要讀10個Byte的資料,它也會從裝置讀取512B的資料,然後再把多餘的剔除,返回給上層呼叫者。
在device block size的上層是filesystem block size:對於filesystem來說一個block也是最小的讀寫單位。也即只有一個位元組的檔案,在底層device上也會佔一個block的大小。
更多對於block size的解釋,見連結
2. 什麼是Aligned IO
有了block size以後,自然就出現了對齊(align)的概念。所謂對齊就是IO請求的邊界和底層block的邊界重合。也就是說上層IO請求的起始點和偏移量是下層裝置block size的整數倍。同樣讀取512B的資料,對齊後的請求只需要下層裝置的一次IO,而非對齊的請求就需要下層裝置的兩次IO再加上前後資料截斷。也因為如此,aligned IO的效能要比unaligned IO的效能好很多
下面就是從上自下(從DB到Disk)嚴格對齊的一張事例圖
然而,Linux作業系統和MySQL並不嚴格要求IO對齊。unaligned IO只會造成IO請求效能略低,但並不應該出現訪問報錯。
那是什麼樣的原因導致xtrabackup在4K sector size的裝置上報錯了呢?
3. O_DIRECT 和 unaligned IO
查閱Linux文件以後我們發現,檔案系統在O_DIRECT模式下開啟的檔案有IO對齊的限制。而xtrabackup在使用了O_DIRECT方式open file的情況,發起了unaligned IO。這種情況下,檔案系統會拒絕IO請求。
具體文件摘抄如下:
Users must always take care to use properly aligned and sized IO. This
is especially important for Direct I/O access. Direct I/O should be
aligned on a 'logical_block_size' boundary and in multiples of the
'logical_block_size'. With native 4K devices (logical_block_size is 4K)
it is now critical that applications perform Direct I/O that is a
multiple of the device's 'logical_block_size'. This means that
applications that do not perform 4K aligned I/O, but 512-byte aligned
I/O, will break with native 4K devices. Applications may consult a
device's "I/O Limits" to ensure they are using properly aligned and
sized I/O. The "I/O Limits" are exposed through both sysfs and block
device ioctl interfaces (also see: libblkid).
而檢視xtrabackup 2.0.7 對於這個bug的描述,我們也可以發現這個bug的修復實際上就是簡單的把 O_DIRECT的檔案開啟屬性去除。具體change log摘抄如下:
The problem was in an length-unaligned I/O request issued while
manipulating xtrabackup_logfile with O_DIRECT enabled.We don't actually need O_DIRECT in those cases, so the fix was to
disable O_DIRECT.. The patch also removes userspace buffer alignment
code and implements other minor cleanups.
4. 相關文件
http://www.orczhou.com/index.php/2009/08/innodb_flush_method-file-io/
https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1055547
https://bugs.launchpad.net/percona-xtrabackup/+bug/902567
https://bugs.launchpad.net/percona-server/+bug/1033051
http://www.linuxintro.org/wiki/Blocks,_block_devices_and_block_sizes
http://www.mysqlperformanceblog.com/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/
http://people.redhat.com/msnitzer/docs/io-limits.txt