ORA-04021: timeout occurred while waiting to lock object的解決辦法
版權宣告:本文為Buddy Yuan原創文章,未經允許不得轉載。原文地址:ofollow,noindex" target="_blank">http://www.dboracle.com/archivers/ORA-04021: TIMEOUT OCCURRED WHILE WAITING TO LOCK OBJECT的解決辦法.html
國慶都放假了,咱們還處理了一起小故障。先說說這個故障吧,Acticve DataGuard突然例項宕機。通過檢視Alert的Trace發現是LGWR程序把例項宕掉了。具體alert trace如下:
Mon Oct 01 00:15:49 2018 Media Recovery Waiting for thread 2 sequence 309643 (in transit) Recovery of Online Redo Log: Thread 2 Group 64 Seq 309643 Reading mem 0 Mem# 0: +DG_DATA/dgskgj/onlinelog/group_64.807.942196417 Mem# 1: +DG_DATA/dgskgj/onlinelog/group_64.717.936131839 Mon Oct 01 00:15:54 2018 Archived Log entry 186436 added for thread 2 sequence 309642 ID 0x2f689337 dest 1: Mon Oct 01 00:16:17 2018 Errors in file /oracle/app/product/diag/rdbms/dgskgj/skgj1/trace/skgj_lgwr_14418510.trc: ORA-04021: timeout occurred while waiting to lock object LGWR (ospid: 14418510): terminating the instance due to error 4021 Mon Oct 01 00:16:18 2018 System state dump requested by (instance=1, osid=14418510 (LGWR)), summary=[abnormal instance termination].
可以看到這裡首先出現了ORA-04021: timeout occurred while waiting to lock object,緊接著LGWR就terminating例項。所以我們在這裡要先看一下LGWR的Trace情況。
Trace file /oracle/app/product/diag/rdbms/dgskgj/skgj1/trace/skgj1_lgwr_14418510.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/app/product/11.2/db System name:AIX Node name:drskgj1 Release:1 Version:6 Machine:00F96EC64C00 Instance name: skgj1 Redo thread mounted by this instance: 1 Oracle process number: 23 Unix process pid: 14418510, image: oracle@drskgj1 (LGWR) *** 2018-10-01 00:16:17.890 *** SESSION ID:(4348.1) 2018-10-01 00:16:17.890 *** CLIENT ID:() 2018-10-01 00:16:17.890 *** SERVICE NAME:(SYS$BACKGROUND) 2018-10-01 00:16:17.890 *** MODULE NAME:() 2018-10-01 00:16:17.890 *** ACTION NAME:() 2018-10-01 00:16:17.890 *** TRACE FILE RECREATED AFTER BEING REMOVED *** error 4021 detected in background process ORA-04021: timeout occurred while waiting to lock object kjzduptcctx: Notifying DIAG for crash event ----- Abridged Call Stack Trace ----- ksedsts()+240<-kjzdssdmp()+240<-kjzduptcctx()+228<-kjzdicrshnfy()+120<-ksuitm()+5136<-ksbrdp()+4696<-opirip()+1620<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+276<-main()+204<-__start()+112 ----- End of Abridged Call Stack Trace ----- *** 2018-10-01 00:16:17.896 LGWR (ospid: 14418510): terminating the instance due to error 4021 ksuitm: waiting up to [5] seconds before killing DIAG(15991690)
這裡可以看到LGWR給的資訊也很有限,就是給了一串堆疊的資訊,也沒有什麼特別的其他資訊。所以這種情況下,我們就只能先借助MOS查一下ORA-04021: timeout occurred while waiting to lock object是什麼問題。通過搜尋我們發現文件ORA-04021: timeout occurred while waiting to lock object : DR Instance terminated by LGWR (文件 ID 2183882.1)和我們遇到的問題是一致的。首先怎麼判斷問題是一致的呢?第一,alter日誌報錯的方式相同,第二,lgwr的trace堆疊是一樣的。那麼這篇告訴我們,我們命中了Bug 16717701 – ADG SHOULD GET THE INSTANCE PARSE LOCK WITH A TIMEOUT或者是Bug 11712267 – ACTIVE DATA GUARD DATABASE HUNG ON ‘LIBRARY CACHE: MUTEX X’ WAIT EVENT。
該問題的原因是當通過ADG中的恢復,LGWR將DB INSTANCE狀態物件鎖定為獨佔模式。這樣的結果是LGWR可以阻止SQL的解析,而SQL的解析也能阻止LGWR。這是非常糟糕的行為。我們可以通過檢視下列檢視查詢這個行為。
SQL> select a.*,b.name from v$sesstat a , v$statname b 2where a.statistic#=b.statistic# 3and a.sid=(select distinct sid from v$mystat) 4and b.name like '%parse%'; SID STATISTIC#VALUE NAME ---------- ---------- ---------- ------------------------------ 1172640 ADG parselock X get attempts 1172650 ADG parselock X get successes
所以防止這個問題的辦法就是
1.先嚐試使用選項步驟1,將cursor_sharing更改為force。減少SQL解析的時間。
2.如果再次發生該問題,將隱含引數”_adg_parselock_timeout”設定成500。這個隱含引數是可以動態修改的。這個引數是防止超時的。