1. 程式人生 > >ORACLE11gR2-RAC之OCR無備份情況下損壞恢復

ORACLE11gR2-RAC之OCR無備份情況下損壞恢復

OCR損壞

情景介紹:
做OCR備份恢復實驗,OCR有4份自動備份。將OCR磁盤從+DATA替換為+OCR2(/dev/raw/raw4) 完成之後使用ocrconfig -manualbackup手動備份OCR,完成之後對/dev/raw/raw4執行dd操作。關閉集群,啟動集群,發現集群不能啟動。

問題分析(假設不知道問題出在哪裏,先分析):
1、檢查集群服務,發現CRS和CSS服務未能正常啟動
crsctl check crs
2、檢查CRS和CSS日誌,發現OCR磁盤異常
3、恢復OCR(其實就是使用root.sh重建OCR的過程,重建之後需要重新註冊相關的資源如listener/database等)
清空所有節點的cluster配置信息:root用戶執行 $GRID_HOME/crs/install/rootcrs.pl

節點1
[root@node1 install]# ./rootcrs.pl
Using configuration parameter file: ./crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

節點2
[root@node2 install]# ./rootcrs.pl
Using configuration parameter file: ./crsconfig_params

User ignored Prerequisites during installation
Installing Trace File Analyzer
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

清除所有節點的cluster信息
節點1
[root@node1 install]# ./rootcrs.pl -deconfig -force
Using configuration parameter file: ./crsconfig_params
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type

PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd

CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node1‘
CRS-2673: Attempting to stop ‘ora.mdnsd‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.crf‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.evmd‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.drivers.acfs‘ on ‘node1‘
CRS-2677: Stop of ‘ora.evmd‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.crf‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.mdnsd‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.drivers.acfs‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘node1‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘node1‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd‘ on ‘node1‘
CRS-2677: Stop of ‘ora.gipcd‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd‘ on ‘node1‘
CRS-2677: Stop of ‘ora.gpnpd‘ on ‘node1‘ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘node1‘ has completed
CRS-4133: Oracle High Availability Services has been stopped.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

節點2
[root@node2 install]# ./rootcrs.pl -deconfig -force -lastnode
Using configuration parameter file: ./crsconfig_params
CRS-5702: Resource ‘ora.cssd‘ is already running on ‘node2‘
CRS-4000: Command Start failed, or completed with errors.
CSS startup failed with return code 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd

CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Delete failed, or completed with errors.
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.evmd‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘node2‘
CRS-2677: Stop of ‘ora.evmd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘node2‘ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor‘ on ‘node2‘
CRS-2676: Start of ‘ora.cssdmonitor‘ on ‘node2‘ succeeded
CRS-2672: Attempting to start ‘ora.cssd‘ on ‘node2‘
CRS-2672: Attempting to start ‘ora.diskmon‘ on ‘node2‘
CRS-2676: Start of ‘ora.diskmon‘ on ‘node2‘ succeeded
CRS-2676: Start of ‘ora.cssd‘ on ‘node2‘ succeeded
CRS-4611: Successful deletion of voting disk +DATA.
ASM de-configuration trace file location: /tmp/asmcadc_clean2016-10-31_02-02-22-PM.log
ASM Clean Configuration START
ASM Clean Configuration END

ASM with SID +ASM1 deleted successfully. Check /tmp/asmcadc_clean2016-10-31_02-02-22-PM.log for details.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node2‘
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.mdnsd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.mdnsd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.crf‘ on ‘node2‘
CRS-2677: Stop of ‘ora.crf‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.gipcd‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.gpnpd‘ on ‘node2‘ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘node2‘ has completed
CRS-4133: Oracle High Availability Services has been stopped.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

重建OCR和OLR,使用root.sh腳本完成重建,其實就是安裝RAC中執行的腳本,默認位置為$GRID_HOME

節點1
[root@node1 grid]# ./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-2672: Attempting to start ‘ora.mdnsd‘ on ‘node1‘
CRS-2676: Start of ‘ora.mdnsd‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd‘ on ‘node1‘
CRS-2676: Start of ‘ora.gpnpd‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor‘ on ‘node1‘
CRS-2672: Attempting to start ‘ora.gipcd‘ on ‘node1‘
CRS-2676: Start of ‘ora.cssdmonitor‘ on ‘node1‘ succeeded
CRS-2676: Start of ‘ora.gipcd‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.cssd‘ on ‘node1‘
CRS-2672: Attempting to start ‘ora.diskmon‘ on ‘node1‘
CRS-2676: Start of ‘ora.diskmon‘ on ‘node1‘ succeeded
CRS-2676: Start of ‘ora.cssd‘ on ‘node1‘ succeeded

ASM created and started successfully.

Disk Group DATA created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root‘, privgrp ‘root‘..
Operation successful.
Successful addition of voting disk 4331dad495c14f71bfdb6d4f1a82d2f9.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced

STATE File Universal Id File Name Disk group


  1. ONLINE 4331dad495c14f71bfdb6d4f1a82d2f9 (/dev/raw/raw1) [DATA]
    Located 1 voting disk(s).
    CRS-2672: Attempting to start ‘ora.asm‘ on ‘node1‘
    CRS-2676: Start of ‘ora.asm‘ on ‘node1‘ succeeded
    CRS-2672: Attempting to start ‘ora.DATA.dg‘ on ‘node1‘
    CRS-2676: Start of ‘ora.DATA.dg‘ on ‘node1‘ succeeded
    Preparing packages for installation...
    cvuqdisk-1.0.9-1
    Configure Oracle Grid Infrastructure for a Cluster ... succeeded

節點2
[root@node2 grid]# ./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

檢查資源信息

節點1
[root@node1 grid]# crs_stat -t
Name Type Target State Host

ora.DATA.dg ora....up.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora....ry.acfs ora....fs.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node1
[root@node1 grid]# crsctl stat res -t

NAME TARGET STATE SERVER STATE_DETAILS

Local Resources

ora.DATA.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2

Cluster Resources

ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node1
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.oc4j
1 ONLINE ONLINE node1
ora.scan1.vip
1 ONLINE ONLINE node1

節點2
[root@node2 grid]# crs_stat -t
Name Type Target State Host

ora.DATA.dg ora....up.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora....ry.acfs ora....fs.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node1
[root@node2 grid]# crsctl stat res -t

NAME TARGET STATE SERVER STATE_DETAILS

Local Resources

ora.DATA.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2

Cluster Resources

ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node1
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.oc4j
1 ONLINE ONLINE node1
ora.scan1.vip
1 ONLINE ONLINE node1

查看磁盤組信息,如果沒有掛載則手動掛載:
SQL> select name,state from v$asm_diskgroup;

4、添加資源(監聽、數據庫、實例等)

添加監聽
[grid@node1 ~]$ srvctl add listener -l listener
查看監聽
[grid@node1 ~]$ srvctl config listener

添加db和instance
[oracle@node1 ~]$ srvctl add database -h
[oracle@node1 ~]$ srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/db_1 -c RAC
[oracle@node1 ~]$ srvctl add instance -h
[oracle@node1 ~]$ srvctl add instance -d orcl -i orcl1 -n node1
[oracle@node1 ~]$ srvctl add instance -d orcl -i orcl2 -n node2
[oracle@node1 ~]$ srvctl config database -d orcl

5、資源添加完畢,重新啟動集群
[root@node1 grid]# crsctl stop cluster -all
[root@node1 grid]# crsctl start cluster -all

添加完成後,可能出現數據庫不能自動啟動的問題。嘗試執行以下語句:
[oracle@node1 ~]$ srvctl enable database -d orcl
[oracle@node1 ~]$ srvctl enable instance -d orcl -i orcl1
[oracle@node1 ~]$ srvctl enable instance -d orcl -i orcl2
[oracle@node1 ~]$ srvctl start database -d orcl

ORACLE11gR2-RAC之OCR無備份情況下損壞恢復