1. 程式人生 > >HP-UX 11.31雙根盤故障案例分析

HP-UX 11.31雙根盤故障案例分析

hp-ux 根盤

環境描述:

Superdome SX2000服務器一臺(HPUX 11.23系統),外接MSA60根盤櫃,兩塊根盤,磁盤設備文件名分別為:c2t3d0,c3t3d0(PV Link);c2t4d0,c3t4d0(pv link).

故障描述:

其中一塊根盤c2t3d0在event log中報mdeia error,正常更換根盤後,發現lvlnboot信息無法更新(lvlnboot信息不正確,重啟或宕機後機器可能會無法啟動)。

分析過程描述:

在更換根盤前,vgdisplay -v vg00 的輸出如下:

hostname#[/]vgdisplay -v vg00

--- Volume groups ---

VG Name /dev/vg00

VG Write Access read/write

VG Status available

Max LV 255

Cur LV 10

Open LV 10

Max PV 16

Cur PV 2

Act PV 2

Max PE per PV 4356

VGDA 4

PE Size (Mbytes) 32

Total PE 8712

Alloc PE 4087

Free PE 4625

Total PVG 0

Total Spare PVs 0

Total Spare PVs in use 0

(中間lv詳細信息省略)

--- Physical volumes ---

PV Name /dev/dsk/c2t3d0s2

PV Name /dev/dsk/c3t3d0s2 Alternate Link

PV Status available

Total PE 4356

Free PE 4356

Autoswitch On

Proactive Polling On

PV Name /dev/dsk/c3t4d0s2

PV Name /dev/dsk/c2t4d0s2 Alternate Link

PV Status available

Total PE 4356

Free PE 269

Autoswitch On

Proactive Polling On

由上述輸出可以看出,vg00總共包括兩塊pv:c2t3d0s2,c3t3d0s2(pvlink)和c3t4d0s2,c2t4d0s2(pvlink).現在由於

c2t3d0有media error,所以要將其換掉。

在更換根盤前,lvlnboot的輸出如下,不知各位有沒有發現是否有異常呢?

hostname#[/]lvlnboot -v

Current path "/dev/dsk/c3t3d0s2" is an alternate link, skip.

Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.

Boot Definitions for Volume Group /dev/vg00:

Physical Volumes belonging in Root Volume Group:

/dev/dsk/c2t3d0s2 (0/0/13/0/0/0/0.0.0.3.0) -- Boot Disk

/dev/dsk/c3t3d0s2 (0/0/2/0/0/0/0.0.0.3.0)

/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0) //此處應該也有Boot Disk才對

/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)

Boot: lvol1 on: /dev/dsk/c2t3d0s2

/dev/dsk/c3t3d0s2

/dev/dsk/c3t4d0s2

/dev/dsk/c2t4d0s2

Root: lvol3 on: /dev/dsk/c2t3d0s2

/dev/dsk/c3t3d0s2

/dev/dsk/c3t4d0s2

/dev/dsk/c2t4d0s2

Swap: lvol2 on: /dev/dsk/c2t3d0s2

/dev/dsk/c3t3d0s2

/dev/dsk/c3t4d0s2

/dev/dsk/c2t4d0s2

Dump: lvol2 on: /dev/dsk/c2t3d0s2, 0

接下來便是正常的更換根盤步驟,填充EFI,鏡像lv等。新根盤設備文件名為:c2t6d0,c3t6d0(pv link);換完根盤後的vg00所包含的pv信息如下:

--- Physical volumes ---

PV Name /dev/dsk/c3t4d0s2

PV Name /dev/dsk/c2t4d0s2 Alternate Link

PV Status available

Total PE 4356

Free PE 269

Autoswitch On

Proactive Polling On

PV Name /dev/dsk/c3t6d0s2

PV Name /dev/dsk/c2t6d0s2 Alternate Link

PV Status available

Total PE 4356

Free PE 4356

Autoswitch On

Proactive Polling On

至此,更換根盤的過程就已經結束了,該是執行lvlnboot -R的時候了,在執行lvlnboot -R前lvlnboot的

輸出如下(此時,已經可以看出c3t4d0這快盤的結構有問題了):

hostname#[/]lvlnboot -v

Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.

Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.

Boot Definitions for Volume Group /dev/vg00:

Physical Volumes belonging in Root Volume Group:

/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0)

/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)

/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk

/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)

No Boot Logical Volume configured

No Root Logical Volume configured

No Swap Logical Volume configured

No Dump Logical Volume configured

上面的輸出說明此時的操作系統沒有有效的lvlnboot信息,swap,dump,root,boot等lv均未定義,如果此時宕機或重啟,則機器肯定無法啟動,如果沒有備份,可能需要重新安裝操作系統!

在執行完lvlnboot -R後,依然無法更新lvlnboot信息。分別執行lvlnboot -r;lvlnboot -b等信息均報錯,輸出如下:

hostname#[/]lvlnboot -r /dev/vg00/lvol3

lvlnboot: Physical Volume "/dev/dsk/c3t4d0s2" on which Logical

Volume "/dev/vg00/lvol3" resides is not a Boot Physical Volume.

hostname#[/]lvlnboot -d /dev/vg00/lvol2

lvlnboot: A Root Logical Volume must be assigned before

a Dump or Swap Logical Volume can be assigned.

hostname#[/]lvlnboot -s /dev/vg00/lvol2

lvlnboot: A Root Logical Volume must be assigned before

a Dump or Swap Logical Volume can be assigned.

hostname#[/]lvlnboot -R

Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf

hostname#[/]lvlnboot -v

Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.

Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.

Boot Definitions for Volume Group /dev/vg00:

Physical Volumes belonging in Root Volume Group:

/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0)

/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)

/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk

/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)

Root LV not yet configured !! Mirror information will not be displayed

Boot: lvol1 on: /dev/dsk/c3t6d0s2

No Root Logical Volume configured

No Swap Logical Volume configured

No Dump Logical Volume configured

試了很多次,均無法解決問題,由於此機器為客戶的生產機,且是重要業務的生產,絕對不允許宕機;後來,我認真的看了上面的其中一句話,也就是上面報錯信息中的一句:

Physical Volume "/dev/dsk/c3t4d0s2" on which Logical

Volume "/dev/vg00/lvol3" resides is not a Boot Physical Volume.

這句話的大概意思是說:lvol3這個lv所在的c3t4d0s2這個分區不是一個可啟動的PV,即不是一個有效的Boot Disk,為什麽系統不認為它是一個有效的Boot Disk呢?其實,這一點在開頭就可以看出來了,在沒有維修前lvlnboot的輸出就只有一塊Boot Disk標識(見開頭lvlnboot輸出中被標紅的字體)。

經過詳細檢查和case跟蹤,其他所有原因都排出了(lv鏡像,EFI區等),最後發現原因是因為c3t4d0這塊盤當初被pvcreate加進vg00時沒有加-B參數(即,當初把c3t4d0這塊盤加進vg00時,執行的是pvcreate /dev/rdsk/c3t4d0s2,正常的應該是

執行pvcreate -B /dev/rdsk/c3t4d0s2),未加-B參數直接導致盤上沒有BDRA區域,且操作系統不認為該盤是Boot Disk。所以lvlnboot的信息一直無法同步。

解決方法:

將c2t4d0s2和它的pv link c3t4d0s2這塊盤從vg00中剔除(剔除前需要先將所有lv的mirror從其中拆掉),重新pvreate -B,再加入vg00,再mirror vg00下的所有lv後,問題解決。

解決後,正常後的lvlnboot的輸入如下:

hostname#[/]lvlnboot -v

Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.

Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.

Boot Definitions for Volume Group /dev/vg00:

Physical Volumes belonging in Root Volume Group:

/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk

/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)

/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0) -- Boot Disk

/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)

Boot: lvol1 on: /dev/dsk/c3t6d0s2

/dev/dsk/c2t6d0s2

/dev/dsk/c3t4d0s2

/dev/dsk/c2t4d0s2

Root: lvol3 on: /dev/dsk/c3t6d0s2

/dev/dsk/c2t6d0s2

/dev/dsk/c3t4d0s2

/dev/dsk/c2t4d0s2

Swap: lvol2 on: /dev/dsk/c3t6d0s2

/dev/dsk/c2t6d0s2

/dev/dsk/c3t4d0s2

/dev/dsk/c2t4d0s2

Dump: lvol2 on: /dev/dsk/c3t6d0s2, 0

通過上面標紅的字體,大家可以看出,此時系統均已將兩塊盤標識為Boot Disk,同事lvlnboot的信息也恢復正常。

至此,故障處理結束。


HP-UX 11.31雙根盤故障案例分析