1. 程式人生 > >RAC 一次掉盤導致叢集故障

RAC 一次掉盤導致叢集故障

業務反饋,兩臺主機上面的資料庫都宕機了,採用的儲存是資料檔案方式,不是ASM。

上去先檢視叢集狀態。

[[email protected] ~]$ crsctl stat res -t -init  --可以看到叢集管理的資源狀態都是offline狀態。

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        OFFLINE OFFLINE                               Instance Shutdown   

ora.cluster_interconnect.haip

      1        ONLINE  OFFLINE                                                   

ora.crf

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.crsd

      1        ONLINE  OFFLINE                                                   

ora.cssd

      1        ONLINE  OFFLINE                               STARTING            

ora.cssdmonitor

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.ctssd

      1        ONLINE  OFFLINE                                                   

ora.diskmon

      1        OFFLINE OFFLINE                                                   

ora.evmd

      1        ONLINE  OFFLINE                                                   

ora.gipcd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.gpnpd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.mdnsd

      1        ONLINE  ONLINE       cxcsdb01                   

 

[[email protected] ~]$ ps -ef | grep crs   --可以看到crsd.bin這個程序是沒有起來的

grid     33095 30418  0 10:26 pts/2    00:00:00 grep --color=auto crs

[[email protected] ~]$ ps -ef | grep css

root     30844     1  0 10:24 ?        00:00:00 /opt/oracle/11.2.0.4/grid/bin/cssdmonitor

root     30856     1  0 10:24 ?        00:00:00 /opt/oracle/11.2.0.4/grid/bin/cssdagent

grid     30868     1  0 10:24 ?        00:00:00 /opt/oracle/11.2.0.4/grid/bin/ocssd.bin

grid     33129 30418  0 10:26 pts/2    00:00:00 grep --color=auto css

[[email protected] ~]$ ps -ef | grep ohasd

root      1513     1  0 Oct17 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple

root      4266     1  0 10:04 ?        00:00:07 /opt/oracle/11.2.0.4/grid/bin/ohasd.bin reboot

grid     33254 30418  0 10:26 pts/2    00:00:00 grep --color=auto ohasd

 

去看css的相關日誌

[[email protected] cssd]$ tail -f ocssd.log  --紅色部分可以看到掉盤了

............................................................................................

2018-10-18 10:21:56.163: [    CSSD][2202380032]clssnmReadDiscoveryProfile: voting file discovery string(/crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03)

2018-10-18 10:21:56.163: [    CSSD][2202380032]clssnmvDDiscThread: using discovery string /crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03 for initial discovery

2018-10-18 10:21:56.163: [   SKGFD][2202380032]Discovery with str:/crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.163: [   SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata01/votedata01:

 

2018-10-18 10:21:56.163: [   SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata01/votedata01

2018-10-18 10:21:56.164: [   SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata01/votedata01:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Discovery advancing to nxt string :/crsdata/votedisk/votedata02/votedata02:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata02/votedata02:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata02/votedata02

2018-10-18 10:21:56.164: [   SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata02/votedata02:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Discovery advancing to nxt string :/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata03/votedata03

2018-10-18 10:21:56.164: [   SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.164: [    CSSD][2202380032]clssnmvDiskVerify: Successful discovery of 0 disks

2018-10-18 10:21:56.164: [    CSSD][2202380032]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery

2018-10-18 10:21:56.164: [    CSSD][2202380032]clssnmvFindInitialConfigs: No voting files found

2018-10-18 10:21:56.164: [    CSSD][2202380032](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

2018-10-18 10:21:56.478: [    CSSD][2204923648]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f6278060880) client((nil))

......................................................................................................................................................

 

和業務確認在主機上面/crsdata檔案系統確實不存在了,業務掛上盤之後,叢集自動拉起。

[[email protected] ~]$ df -h

Filesystem                  Size  Used Avail Use% Mounted on

/dev/sda5                   474G   35G  439G   8% /

devtmpfs                    126G     0  126G   0% /dev

tmpfs                       126G     0  126G   0% /dev/shm

tmpfs                       126G   27M  126G   1% /run

tmpfs                       126G     0  126G   0% /sys/fs/cgroup

/dev/sda3                    20G   54M   20G   1% /home

/dev/sda1                   497M  166M  332M  34% /boot

tmpfs                       4.0K     0  4.0K   0% /dev/vx

tmpfs                        26G     0   26G   0% /run/user/50008

tmpfs                        26G     0   26G   0% /run/user/50007

tmpfs                        26G     0   26G   0% /run/user/1000

/dev/vx/dsk/crsdg/crsvol     14G  106M   14G   1% /crsdata

/dev/vx/dsk/archdg/archvol  199G  2.7G  195G   2% /archive

/dev/vx/dsk/oradg/oravol01 1000G  554G  443G  56% /oradata01

 

 

[[email protected] ~]$ crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        OFFLINE OFFLINE                               Instance Shutdown   

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.crf

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.crsd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.cssd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.cssdmonitor

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.ctssd

      1        ONLINE  ONLINE       cxcsdb01                 OBSERVER            

ora.diskmon

      1        OFFLINE OFFLINE                                                   

ora.evmd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.gipcd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.gpnpd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.mdnsd

      1        ONLINE  ONLINE       cxcsdb01