1. 程式人生 > >11g RAC 節點二MMON進程異常

11g RAC 節點二MMON進程異常

red epo let time free lin 滿了 sel ping

一早發現核心系統的DBtime監控閾值一直在某一個點平移,感覺有點不對勁。
因為我們的腳本依托dba_hist_snapshot試圖的SNIP來做的。遂進行AWR報告的生成查看其SNAP_ID是否有異常;

                          21220 19 Sep 2018 09:00      1
                          21221 19 Sep 2018 10:00      1
                          21222 19 Sep 2018 11:00      1
                          21223 19 Sep 2018 12:00      1
                          21224 19 Sep 2018 13:00      1
                          21225 19 Sep 2018 14:00      1
                          21226 19 Sep 2018 15:00      1
                          21227 19 Sep 2018 16:00      1
                          21228 19 Sep 2018 17:00      1
                          21229 19 Sep 2018 18:00      1
                          21230 19 Sep 2018 19:00      1

Specify the Begin and End Snapshot Ids


Enter value for begin_snap: 

昨天晚上系統確實是有CBC相關的等待,不過很快就恢復了。這是什麽情況,難道是數據庫歸檔滿了,或者是mm進程down了?試著手動生成個SNAP_ID試試。發現是可以的。

[oracle@bapdb2 trace]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Sep 20 10:40:33 2018

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

10:40:33 SYS@bapdb2(bapdb2)> set line 300 pages 1000
10:40:35 SYS@bapdb2(bapdb2)> BEGIN
10:40:37   2  DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT ();
10:40:37   3  END;
10:40:37   4  /

PL/SQL procedure successfully completed.

系統內的歸檔目錄也很充足,不存在歸檔異常導致進程異常的情況;

10:43:57 SYS@b2(db2)> select group_number,block_size,name,allocation_unit_size,state,type,total_mb,free_mb,offline_disks from v$asm_diskgroup;

GROUP_NUMBER BLOCK_SIZE NAME                           ALLOCATION_UNIT_SIZE STATE       TYPE     TOTAL_MB    FREE_MB OFFLINE_DISKS
------------ ---------- ------------------------------ -------------------- ----------- ------ ---------- ---------- -------------
           1       4096 SAS_ARCH                                    1048576 CONNECTED   EXTERN    1024000     617921             0

節點一查看進程:
[oracle@db1 ~]$ ps -ef |grep mm
grid       6634      1  0  2017 ?        00:33:47 asm_mman_+ASM1
grid       6648      1  0  2017 ?        01:52:06 asm_mmon_+ASM1
grid       6650      1  0  2017 ?        2-00:53:46 asm_mmnl_+ASM1
oracle     8610      1  0  2017 ?        00:33:56 ora_mman_db1
oracle     8650      1  0  2017 ?        3-11:28:35 ora_mmon_db1
oracle     8655      1  1  2017 ?        4-07:20:56 ora_mmnl_db1

節點二查看進程:
[oracle@bapdb2 ~]$ ps -ef |grep mm
oracle    54354  53982  0 11:09 pts/1    00:00:00 grep mm
grid     105256      1  0  2017 ?        00:23:52 asm_mman_+ASM2
grid     105295      1  0  2017 ?        01:15:06 asm_mmon_+ASM2
grid     105312      1  0  2017 ?        1-03:49:26 asm_mmnl_+ASM2
oracle   106889      1  0  2017 ?        00:28:00 ora_mman_db2
oracle   106927      1  0  2017 ?        3-04:47:42 ora_mmnl_db2

發現節點二的MMON進程DOWN了。從ALERT日誌進行搜索:
Tue Sep 19 03:49:00 2017
MMON started with pid=36, OS id=8650
Tue Sep 19 03:49:00 2017
MMNL started with pid=37, OS id=8655  

Tue Sep 19 04:01:47 2017
MMON started with pid=36, OS id=106923
Tue Sep 19 04:01:47 2017
MMNL started with pid=37, OS id=106927

這個id為106923的進程確實是異常了。之前處理過類似的情況,可以在節點二直接啟動MMON相關進程;

SQL> alter system enable restricted session; 
System altered. 
SQL> alter system disable restricted session; 
System altered. 

同時Alert日誌也給出了反饋;
Thu Sep 20 11:10:28 2018
Stopping background process MMNL
Starting background process MMON
Starting background process MMNL
Thu Sep 20 11:10:29 2018
MMON started with pid=37, OS id=55936 
Thu Sep 20 11:10:29 2018
MMNL started with pid=236, OS id=55938 
ALTER SYSTEM enable restricted session;
minact-scn: Inst 2 is a slave inc#:16 mmon proc-id:55936 status:0x2
minact-scn status: grec-scn:0x0026.4dcf0d36 gmin-scn:0x0026.4dcf0d36 gcalc-scn:0x0026.4dcf1208
Thu Sep 20 11:11:05 2018
ALTER SYSTEM disable restricted session;
Thu Sep 20 11:13:25 2018
LGWR: Standby redo logfile selected for thread 2 sequence 154126 for destination LOG_ARCHIVE_DEST_3

再次查看進程啟動正常
11:10:29 SYS@db2(xxxdb2)> !ps -ef |grep mm
oracle    55936      1  0 11:10 ?        00:00:00 ora_mmon_db2
oracle    55938      1  0 11:10 ?        00:00:00 ora_mmnl_db2
grid     105256      1  0  2017 ?        00:23:52 asm_mman_+ASM2
grid     105295      1  0  2017 ?        01:15:06 asm_mmon_+ASM2
grid     105312      1  0  2017 ?        1-03:49:26 asm_mmnl_+ASM2
oracle   106889      1  0  2017 ?        00:28:00 ora_mman_db2

追查了一下MMON進程的trc文件,發現最下面有這一條:
*** 2018-09-19 18:46:41.432
minact-scn slave-status: grec-scn:0x0026.4db016c0 gmin-scn:0x0026.4db016c0 gcalc-scn:0x0026.4db0273c
minact-scn slave-status: grec-scn:0x0026.4dbdde59 gmin-scn:0x0026.4dbdde59 gcalc-scn:0x0026.4dbdf492

*** 2018-09-19 18:56:44.302
minact-scn slave-status: grec-scn:0x0026.4dca45db gmin-scn:0x0026.4dca45db gcalc-scn:0x0026.4dca5990

*** 2018-09-19 19:01:37.026
error 28 detected in background process
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process
ORA-00028: your session has been killed

猜想是因為這個問題:
Fixed Objects Statistics (GATHER_FIXED_OBJECTS_STATS) Considerations (文檔 ID 798257.1)

11g RAC 節點二MMON進程異常