PostgreSQL 鎖等待排查實踐 - 珍藏級 - process xxx1 acquired RowExclusiveLock on relation xxx...
標籤
PostgreSQL , 鎖等待 , ddl , 大鎖 , 雪崩 , lock_timeout , deadlock_timeout , AB表切換 , 分割槽表
背景
當SQL請求鎖等待超過deadlock_timeout指定的時間時,報類似如下日誌:
LOG: process xxx1 acquired RowExclusiveLock on relation xxx2 of database xxx3 after xxx4 ms at xxx STATEMENT: INSERT ...........
解釋:
xxx1程序請求位於資料庫xxx3中的xxx2物件的RowExclusiveLock鎖,已等待xxx4秒。
同學們可能會納悶,怎麼insert也會等待?
其實不管什麼操作,都有可能出現等待,只要請求的鎖與已有或已經在佇列中的LOCK級別有衝突就會出現等待。
ofollow,noindex" target="_blank">https://www.postgresql.org/docs/10/static/explicit-locking.html
src/include/storage/lockdefs.h
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */ #define NoLock0 #define AccessShareLock1/* SELECT */ #define RowShareLock2/* SELECT FOR UPDATE/FOR SHARE */ #define RowExclusiveLock3/* INSERT, UPDATE, DELETE */ #define ShareUpdateExclusiveLock 4/* VACUUM (non-FULL),ANALYZE, CREATE INDEX * CONCURRENTLY */ #define ShareLock5/* CREATE INDEX (WITHOUT CONCURRENTLY) */ #define ShareRowExclusiveLock6/* like EXCLUSIVE MODE, but allows ROW * SHARE */ #define ExclusiveLock7/* blocks ROW SHARE/SELECT...FOR UPDATE */ #define AccessExclusiveLock8/* ALTER TABLE, DROP TABLE, VACUUM FULL, * and unqualified LOCK TABLE */
如何分析? - 例項講解
請參考萬能文章:
《PostgreSQL 鎖等待監控 珍藏級SQL - 誰堵塞了誰》
1、開啟審計日誌
log_destination = 'csvlog' logging_collector = on log_truncate_on_rotation = on log_statement = 'all'
2、psql 掛一個列印鎖等待的視窗
psql with t_wait as ( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted ), t_run as ( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted ), t_overlap as ( select r.* from t_wait w join t_run r on ( r.locktype is not distinct from w.locktype and r.database is not distinct from w.database and r.relation is not distinct from w.relation and r.page is not distinct from w.page and r.tuple is not distinct from w.tuple and r.virtualxid is not distinct from w.virtualxid and r.transactionid is not distinct from w.transactionid and r.classid is not distinct from w.classid and r.objid is not distinct from w.objid and r.objsubid is not distinct from w.objsubid and r.pid <> w.pid ) ), t_unionall as ( select r.* from t_overlap r union all select w.* from t_wait w ) select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid, string_agg( 'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)|| 'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)|| 'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)|| 'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)|| 'SQL (Current SQL in Transaction): '||chr(10)|| case when query is null then 'NULL' else query::text end, chr(10)||'--------'||chr(10) order by (case mode when 'INVALID' then 0 when 'AccessShareLock' then 1 when 'RowShareLock' then 2 when 'RowExclusiveLock' then 3 when 'ShareUpdateExclusiveLock' then 4 when 'ShareLock' then 5 when 'ShareRowExclusiveLock' then 6 when 'ExclusiveLock' then 7 when 'AccessExclusiveLock' then 8 else 0 end) desc, (case when granted then 0 else 1 end) ) as lock_conflict from t_unionall group by locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ; \watch 0.2
3、tail 掛一個日誌觀測視窗
for ((i=1;i>0;i=1)); do grep RowExclusiveLock *.csv ; sleep 0.2; done 或 for ((i=1;i>0;i=1)); do grep acquired *.csv ; sleep 0.2; done
4、發現問題
在業務執行過程中,如果問題復現,一定能觀測到日誌。
通過鎖等待的視窗,觀測到其中一個會話對錶xxx持有了 accessExclusiveLock
,也就是排他鎖。通過前面的鎖衝突表,你會發現這個鎖和所有鎖都衝突(實際上 DDL,VACUUM FULL
等操作都會持有排它鎖,或者人為的發出 lock table xxx in access exclusive mode;
)
所以,當然會堵塞其他對該表的INSERT操作了。
4.1、根據鎖等待的視窗拿到持有鎖的PID,到審計日誌裡面檢視這個PID在當前事務中,前面都發起了什麼SQL。
4.2、找到問題根源,原來這個事務發起了 ALTER TABLE XXX RENAME TO XXXXX;
的動作。
這個事件也引起了INSERT變慢(實際上是鎖等待,實踐都花在了等待上面)的問題。
很顯然,業務上應該經常會有一些觸發改表名的動作,比如為了防止一個表太大,經常做AB表切換的動作。
雖然切換表名只是修改元資料,但是這個瞬間的鎖,在高併發的業務場景中,也會帶來堵塞危害。
建議使用者採用分割槽表,而不要在高併發業務中頻繁使用DDL這樣的大鎖高危操作。
小結
本文講解了鎖等待問題的排查方法。原因實際上是使用者在業務中使用了AB表切換,雖然切換表名只是修改元資料,但是這個瞬間的鎖,在高併發的業務場景中,也會帶來堵塞危害。
建議使用者採用分割槽表,而不要在高併發業務中頻繁使用DDL這樣的大鎖高危操作。
《PostgreSQL 9.x, 10, 11 hash分割槽表 用法舉例》
《PostgreSQL 11 preview - 分割槽表 增強 彙總》
《PostgreSQL 查詢涉及分割槽表過多導致的效能問題 - 效能診斷與優化(大量BIND, spin lock, SLEEP程序)》
《PostgreSQL 商用版本EPAS(阿里雲ppas(Oracle 相容版)) - 分割槽表效能優化 (堪比pg_pathman)》
《PostgreSQL 10 內建分割槽 vs pg_pathman perf profiling》
《PostgreSQL 10.0 preview 功能增強 - 內建分割槽表》
《PostgreSQL 9.5+ 高效分割槽表實現 - pg_pathman》
另外。大鎖操作,一定要注意防止雪崩。
最後,遇到問題要冷靜思考,不要輕易認為資料庫有問題。
參考
《PostgreSQL 鎖等待監控 珍藏級SQL - 誰堵塞了誰》