1. 程式人生 > >一次存儲鏈路抖動因I/O timeout不同在AIX和HPUX上的不同表現(轉)

一次存儲鏈路抖動因I/O timeout不同在AIX和HPUX上的不同表現(轉)

有意思 建議 超時 values 最終 tar output 沈澱 possible

去年一個故障案例經過時間的沈澱問題沒在發生今天有時間簡單的總結一下,當時正時午睡時分,突然告警4庫8個實例同時不可用,這麽大面積的故障多數是有共性的關連,當時查看數據庫DB ALERT日誌都是I/O錯誤寫失敗,後確認8個實例都是使用了存儲層的同步容災技術,且存儲為同一品牌日立。

2017-01-22 13:02:14.213000 +08:00
KCF: read, write or open error, block=0x1ad85 online=1    
        file=443 ‘/dev/anbob_oravg01/ranbob_lv15_062‘
        error=27063 txt: ‘HPUX-ia64 Error: 11: Resource temporarily unavailable
Additional information: -1
Additional information: 32768‘
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_dbw7_17700.trc:
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_lgwr_17702.trc:
ORA-00345: redo log write error block 95667 count 10
ORA-00312: online log 4 thread 1: ‘/dev/anbob_oravg02/ranbob_redo04‘
ORA-27063: number of bytes read/written is incorrect
HPUX-ia64 Error: 11: Resource temporarily unavailable
Additional information: -1
Additional information: 10240
KCF: read, write or open error, block=0x5c699 online=1
KCF: read, write or open error, block=0x168297 online=1
        file=29 ‘/dev/anbob_oravg01/ranbob_lv15_024‘
        file=142 ‘/dev/anbob_oravg04/ranbob_lv30_273‘
        error=27063 txt: ‘HPUX-ia64 Error: 11: Resource temporarily unavailable
        error=27063 txt: ‘HPUX-ia64 Error: 11: Resource temporarily unavailable
Additional information: -1
Additional information: -1
Additional information: 8192‘
Additional information: 8192‘
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_dbw1_17688.trc:

再回頭看一下這些數據庫的環境, 使用的是同步的異地容災技術,也就是存儲上層的應用I/O一次要寫兩處,本地和遠程都寫成功才算完成,這裏的應用也就是ORACLE DB,這算是過去容災環境中常用技術,對於存儲同步通常也有異步技術需要購買更貴的license. 這些環境中的DB 因為遠程的鏈路抖動導致I/O寫失敗導致HPUX平臺的數據庫重啟。

不過有意思的時同樣異地容災的數據庫還有其它環境並未重啟,如下

OSStorageIS_Restart
AIXEMCNO
AIXHDSNO
HPUXEMCNO
HPUXHDSYES

Note:
這裏看到只有HPUX和HDS的配合重啟了數據庫,在存儲上EMC工程師當時說從日誌發現錯了錯誤和切換鏈路,但HDS工程師說並未發現錯誤日誌,但是提出日立存儲判斷是當鏈路發生問題時, 切換的超時時間為30+10 秒。那麽再回到上層OS層,HPUX主機的IO timeout時間為30秒, AIX主機為60秒. 所以存在日立存儲切換鏈路前HPUX已I/O 超時,返回了I/O失敗. 而故障時間也可能剛好>30 <60秒所以在AIX timeout前存儲已恢復正常, AIX可以繼續並未重啟。

當然假設以上都是成立的,那究竟當鏈路不可用時,短時內宕掉數據庫保證數據庫一致,還是再增加多一些的時間retry, 為存儲短時內恢復爭取時間正為合適,需要一個時間的權衡。 這個時間也就是PL/SQL 中的commit.

數據庫的ACID中的D也就是持久性,要求COMMIT後的事務要持久化也就是不能丟失,所以在SQL中的COMMIT,都是強置redo log刷到磁盤才可以繼續,如下:

when a session issues a commit, it generates the redo describing how to update its transaction table slot in the undo segment header block, puts this redo into the log buffer, applies it to the undo segment header block, calls the log writer to flush the log buffer to disk, and then goes into a log file sync wait until the log writer lets it know that its entry in the log buffer has been copied to disk.
This commit/rollback mechanism that makes transactions Durable.(D OF ACID )

但是PL/SQL 中的commit是做了優化,為了權衡LOOP 中的COMMIT的性能,commit只是發關給LGWR一個提交的message, 然而並不會一直等lgwr寫磁盤完成就可以繼續下一個事務,這點區別與SQL中事務的認識。可以使用一段PL/SQL測試。

[oracle@anbob ~]$ sqlplus anbob/anbob@anbob/pdbanbob.com

SQL*Plus: Release 12.2.0.0.0 Beta on Tue Feb 7 15:00:27 2017
Copyright (c) 1982, 2015, Oracle.  All rights reserved.
Last Successful login time: Tue Feb 07 2017 14:57:44 +08:00
Connected to:
Oracle Database 12c EE Extreme Perf Release 12.2.0.1.0 - 64bit Production

SQL> create table anbob.t(id int,a date);
Table created.

SQL> @statn commit
     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
         6 6             48 user commits                                                              1 
       219 DB          1752 commit cleanouts                                                          3
       220 DC          1760 commit cleanouts successfully completed                                   3
       647 287         5176 IMU commits                                                               1
...
45 rows selected.

SQL> @statn sync

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       338 152         2704 redo synch time                                                           9
...
       346 15A         2768 redo synch writes                                                         2
...

17 rows selected.

declare
 i int:=0;
begin
  while i<100 loop
  insert into t values(i,sysdate);
  commit;
  dbms_lock.sleep(1);
  i:=i+1;
  end loop;
end; 
  /  

PL/SQL procedure successfully completed.

SQL>@statn commit

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
         6 6             48 user commits                                                            101
       201 C9          1608 BPS commit wait                                                           0
...
       219 DB          1752 commit cleanouts                                                        103
       220 DC          1760 commit cleanouts successfully completed                                 103
       647 287         5176 IMU commits                                                             101

45 rows selected.

SQL> @statn sync

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       338 152         2704 redo synch time                                                           9
 ...
       346 15A         2768 redo synch writes                                                         3

17 rows selected.

Note:
user commits 值是和PLSQL 中 COMMIT一致,但是redo synch writes才增加了一次,註意如果在PL/SQL中使用DBLINK就不再這樣。而SQL中的COMMIT如下

SQL> @statn sync

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       338 152         2704 redo synch time                                                           9
       346 15A         2768 redo synch writes                                                         4
17 rows selected.

SQL> insert into t values(200,sysdate);
1 row created.

SQL> commit;
Commit complete.

SQL> insert into t values(200,sysdate);
1 row created.

SQL> commit;
Commit complete.

SQL> @statn sync

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       338 152         2704 redo synch time                                                          10
       346 15A         2768 redo synch writes                                                         6

Note:
每一次commit都會觸發redo synch writes。

the statistic redo synch writes counts the number of times a session has sent a message (statistic messages sent) to lgwr on a commit. This is an approximation; in fact, “sending a message” may not involve a real message.

Clearly the user session is not behaving as expected—it has posted lgwr to write a few times, but it
has only incremented redo synch writes once, which suggests it didn’t stop and wait for lgwr to wake it
up again. The user’s session is breaching the durability requirement; if the instance crashed somewhere
in the middle of this loop, it’s entirely possible that a transaction that had been committed would not be
recovered.

If we saw this output we could interpret it as 25 cycles of the following sequence:
? User session issues a commit
? User session posts lgwr and increments redo synch writes
? User session goes into a wait (log file sync) waiting to be posted by lgwr
? Lgwr gets woken up
? Lgwr writes the log buffer to disk, waiting a short time on each write

This strategy does not get used if the code is doing updates across database links, so there have been occasions in the
past where I’ve used a totally redundant loopback database link to ensure that some PL/SQL code would wait for a
log file sync on every commit.

所以如果在PLSQL 使用LOOP commit, 像上面如果存儲最終都沒有恢復,那麽commit的事務會丟失。

當然為了保持HPUX和AIX 的一致,數據庫環境都使用了異步IO(AIO)和RAW裸設備的共享存儲,對於數據庫的I/O請求,數據庫只是發送給OS層後就結束,timeout的時間多數取決於OS和存儲層。對於HP平臺而言,與IO timeout相關的內核參數主要是PV timeout、LV timeout、ESD_SECS、asyncdsk_io_timeout等。

HPUX做如下修改:

1, PV timeout默認為30s, 據了解AIX平臺為60s, 調整該參數為60s.

2,LV timout默認依賴PV timeout, 建議值如下LV timeout value = (# of paths * PV Timeout) + 10 seconds

3,ESD_SECS=120 esd_secs attribute determines the timeout of I/O operations to block devices.默認是30s, 在MOS中有案例調整了該參數,因使用RAW device本次未做調整。

4,asyncdsk_io_timout 默認值30s ,調整為120s。

在做了如上調整後,手動的切斷遠程鏈路,並在120s 前恢復,數據庫並未crash.

提示:本案例僅供參考,具體調整需要咨詢OS和存儲廠商。

一次存儲鏈路抖動因I/O timeout不同在AIX和HPUX上的不同表現(轉)