安裝12.1.0.2 叢集GRID/GI, 執行root.sh 指令碼失敗的案例

阿新 • • 發佈：2019-01-15

在Linux系統上安裝12.1.0.2 叢集GRID/GI軟體，節點2執行root.sh失敗，螢幕的錯誤資訊： OLR initialization - successful 2015/12/15 13:16:55 CLSRSC-507: The root script cannot proceed on this node rac2 because either the first-node operations have not completed on node rac1 or there was an error in obtaining the status of the first-node operations. 以上錯誤說明節點2無法確認節點1安裝狀態是否完成。Root.sh是如果來確認節點1安裝是否完成呢？需要檢查日誌： $GRID_HOME>/cfgtoollogs/crsconfig/rootcrs_rac2_2015-12-18_09-41-53PM.log 2015-12-18 21:42:39: Trying to get the value of key: SYSTEM.rootcrs.checkpoints.firstnode in OCR. 2015-12-18 21:42:39: setting ORAASM_UPGRADE to 1 2015-12-18 21:42:39: Check the existence of key pair with key name: SYSTEM.rootcrs.checkpoints.firstnode in OCR. 2015-12-18 21:42:39: setting ORAASM_UPGRADE to 1 2015-12-18 21:42:39: Invoking "/u01/gridsoft/12.1.0/bin/cluutil -exec -keyexists -key checkpoints.firstnode" 2015-12-18 21:42:39: trace file=/u01/gridbase/crsdata/rac2/crsconfig/cluutil9.log 2015-12-18 21:42:39: Running as user grid: /u01/gridsoft/12.1.0/bin/cluutil -exec -keyexists -key checkpoints.firstnode 2015-12-18 21:42:39: s_run_as_user2: Running /bin/su grid -c ' echo CLSRSC_START; /u01/gridsoft/12.1.0/bin/cluutil -exec -keyexists -key checkpoints.firstnode ' 2015-12-18 21:42:39: Removing file /tmp/filexr1WwO 2015-12-18 21:42:39: Successfully removed file: /tmp/filexr1WwO 2015-12-18 21:42:39: pipe exit code: 256

2015-12-18 21:42:39: /bin/su exited with rc=1 2015-12-18 21:42:39: oracle.ops.mgmt.rawdevice.OCRException: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29] 2015-12-18 21:42:39: Cannot get OCR key with CLUUTIL, try using OCRDUMP. 2015-12-18 21:42:39: Check OCR key using ocrdump

2015-12-18 21:42:54: ocrdump output: PROT-302: Failed to initialize ocrdump 2015-12-18 21:42:54: The key pair with keyname: SYSTEM.rootcrs.checkpoints.firstnode does not exist in OCR. 以上資訊說明節點2首先執行cluutil -exec -keyexists -key checkpoints.firstnode命令來檢視OCR中的key: SYSTEM.rootcrs.checkpoints.firstnode，失敗後又嘗試執行OCRDUMP命令，但是OCRDUMP命令也失敗。接下來分析OCRDUMP命令也失敗的原因： $GRID_BASE/diag/crs/<node>/crs/trace/ocrdump_13146.trc 2015-12-18 21:42:48.098879 : OCRASM: ASM Error Stack : ORA-29701: unable to connect to Cluster Synchronization Service 2015-12-18 21:42:48.098885 : OCRASM: proprasmo: ASM instance is down. Proceed to open the file in dirty mode. CLWAL: clsw_Initialize: Error [32] from procr_init_ext CLWAL: clsw_Initialize: Error [PROCL-32: Oracle High Availability Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]] from procr_init_ext 2015-12-18 21:42:48.101773 : GPNP: clsgpnpkww_initclswcx: [at clsgpnpkww.c:351] Result: (56) CLSGPNP_OCR_INIT. (:GPNP01201: )Failed to init CLSW-OLR context. CLSW Error (3): CLSW-3: Error in the cluster registry (OCR) layer. [32] [PROCL-32: Oracle High Availability Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]] 2015-12-18 21:42:48.112746 : OCRASM: proprasmo: Error [13] in opening the GPNP profile. Try to get offline profile 2015-12-18 21:42:48.220769 : OCRRAW: kgfo_kge2slos error stack at kgfolclcpi1: AMDU-00210: No disks found in diskgroup OCR_VOTING

以上資訊提示無法連線ORA-29701 CSS和PROCL-32 OHASD這些都是正常的，因為節點2叢集沒有啟動，這些錯誤可能會干擾我們分析問題。關鍵的錯誤資訊是AMDU-00210: No disks found in diskgroup OCR_VOTING，也就是說節點2沒有找到ASM disk導致OCRDUMP失敗，因此無法確認節點1安裝的狀態是否完成。接下來我們執行kfed確認ASM disk是否有問題：節點1檢視disk /dev/raw/raw1 $ /u01/gridsoft/12.1.0/bin/kfed read /dev/raw/raw1 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD <=========disk raw1型別是KFBTYP_DISKHEAD，是正常的asm disk kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 2147483648 ; 0x008: disk=0 kfbh.check: 420965027 ; 0x00c: 0x19176aa3 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 ... kfdhdb.vfstart: 128 ; 0x0ec: 0x00000080 <=========vfstart 值說明這個disk是vote file kfdhdb.vfend: 160 ; 0x0f0: 0x000000a0 <=========vfend 值說明這個disk是vote file 節點2檢視disk /dev/raw/raw1 $ /u01/gridsoft/12.1.0/bin/kfed read /dev/raw/raw1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID<=========節點2上檢視raw1型別是無效的KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 000000000 00000000 00000000 00000000 00000000 [................] Repeat 255 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] 在節點1檢視/dev/raw/raw1顯示disk 型別是KFBTYP_DISKHEAD，並且kfdhdb.vfstart有值，說明raw1在節點1是正常的asm disk，並且是vote disk。但是節點2檢視相同的disk，顯示完全不同的資訊。正常情況下，配置的共享裝置raw1在節點1和節點2看到的資訊應該是一致的，但是這個case中節點1和節點2看到的是不同的資訊，說明共享disk配置是不正確的。同時，在節點1手動執行OCRDUMP確認key SYSTEM.rootcrs.checkpoints.firstnode是存在的，並且狀態是” SUCCESS” su – root ocrdump /tmp/ocrdump1.out more /tmp/ocrdump1.out [SYSTEM.rootcrs.checkpoints.firstnode] ORATEXT : SUCCESS 最後，修改UDEV配置檔案(/etc/udev/rules.d/99-oracle-asmdevices.rules)後問題解決。

之所以轉載該文件，是因為遇到相同的問題，不過我的問題是共享儲存有問題

先用kfed讀取2個節點的相同共享磁碟，發現內容不一致。

之後使用dd命令清除ASM資訊，再在一個節點上使用fdisk命令對共享儲存進行分割槽，發現另一個節點無法識別到分割槽的資訊。

最終判定共享儲存有問題，刪除共享儲存，再次新增共享儲存，節點A新建分割槽，節點B掃描新的分割槽，可以認為共享儲存功能正常。

安裝12.1.0.2 叢集GRID/GI, 執行root.sh 指令碼失敗的案例

安裝12.1.0.2 叢集GRID/GI, 執行root.sh 指令碼失敗的案例

grid軟體，執行root.sh指令碼失敗，解決辦法

12.1.0.2.0 RAC GI PSU 12.1.0.2.180116

【翻譯自mos文章】在RHEL7/OL7上安裝Oracle 12.1.0.2的伺服器端或者客戶端時，報需要"compat-libstdc++"包

Oracle 12.1.0.2 對JSON的支持

aix下oracle 12.1.0.2 asmca不能打開的故障

oracle 12c 12.1.0.2.0 BUG 22562145

oracle 12C ORA-07445 12.1.0.2.0

Oracle 12.1.0.2 卸載數據庫

ArcSDE for Oracle 12.1.0.2 In-Memory元件測試

Oracle 12.1.0.2 對JSON的支援

12.1.0.2的PDB升級到12.2.0.1的實驗

Oracle 資料庫和補丁下載地址 12.1.0.2 11.2.0.4 11.2.0.1

aix6.1安裝12.1.0.2rac無法識別共享磁碟的問題

ORACLE RAC升級（12.1.0.1升級至12.1.0.2）

安裝 Bzip2-1.0.2

【RAC】安裝cluster軟體在節點2執行root.sh指令碼

安裝Oozie4.1.0-cdh5.5.2

Oracle 12c（12.1.0.5）OEM server agent 安裝配置

CentOS7.2 安裝redis 3.0.6叢集

安裝12.1.0.2 叢集GRID/GI, 執行root.sh 指令碼失敗的案例

相關推薦