1. 程式人生 > >select for update引發死鎖分析

select for update引發死鎖分析

而在 back ons 關系 級別 分析 得到 ica 分享

本文針對MySQL中在Repeatable Read的隔離級別下使用select for update可能引發的死鎖問題進行分析。

1. 案例

業務中需要對各種類型的實體進行編號,例如對於x類實體的編號可能是x201712120001,x201712120002,x201712120003類似於這樣。可以觀察到這類編號有兩個部分組成:x+日期作為前綴,以及流水號(這裏是四位的流水號)。

如果用數據庫表實現一個能夠分配流水號的需求,無外乎就可以建立一個類似於下面的表

CREATE TABLE number (
  prefix VARCHAR(20) NOT NULL DEFAULT ‘‘ COMMENT ‘前綴碼‘,
  value
BIGINT NOT NULL DEFAULT 0 COMMENT ‘流水號‘, UNIQUE KEY uk_prefix(prefix) );

那麽在業務層,根據業務規則得到編號的前綴比如x20171212,接下去就可以在代碼中起事務,用select for update進行如下的控制。

@Transactional
long acquire(String prefix) {
    SerialNumber current = dao.selectAndLock(prefix);
    if (current == null) {
        dao.insert(new Record
(prefix, 1)); return 1; } else { current.number++; dao.update(current); return current.number; } }

這段代碼做的事情其實就是加鎖篩選,有則更新,無則插入,然而在Repeatable Read的隔離級別下這段代碼是有潛在死鎖問題的。(另一處與事務相關的問題也會在下文提及)。

2. 死鎖的原因

當可以通過select for update的where條件篩出記錄時,上面的代碼是不會有deadlock問題的。然而當select for update中的where條件無法篩選出記錄時,這時在有多個線程執行上面的acquire方法時是可能會出現死鎖的。

2.1 死鎖的簡單復現

下面通過一個比較簡單的例子復現一下這個場景
首先給表裏初始化3條數據。

insert into number select ‘bbb‘,2;
insert into number select ‘hhh‘,8;
insert into number select ‘yyy‘,25;

接著按照如下的時序進行操作:

session 1 session 2
begin;
begin;
select * from number where prefix=‘ddd‘ for update;
select * from number where prefix=‘fff‘ for update
insert into number select ‘ddd‘,1
阻塞中 insert into number select ‘fff‘,1
插入成功 死鎖,session 2的事務被回滾

2.2 死鎖的分析

通過show engine innodb status,我們慢慢地觀察每一步的情況:

2.2.1 session1做了select for update

------------
TRANSACTIONS
------------
Trx id counter 238435
Purge done for trx‘s n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479459588792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238434, ACTIVE 3 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost root
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

可以看到這裏,事務238434拿到了hhh前的gap鎖。

2.2.2 session2做了select for update

------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx‘s n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238435, ACTIVE 3 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost root
TABLE LOCK table test.number trx id 238435 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
---TRANSACTION 238434, ACTIVE 30 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost root
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

可以看到這裏事務238435也拿到了hhh前的gap鎖。

2.2.3 session1嘗試insert

------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx‘s n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238435, ACTIVE 28 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost root
TABLE LOCK table test.number trx id 238435 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
---TRANSACTION 238434, ACTIVE 55 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executing
insert into number select ‘ddd‘,1
------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
------------------
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

可以看到,這時候事務238434在嘗試插入‘ddd‘,1時,由於發現其他事務(238435)已經有這個區間的gap鎖,因此innodb給事務238434上了插入意向鎖,鎖的模式為LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION,等待事務238435釋放掉gap鎖。

技術分享圖片
截取自innodb源碼的lock_rec_insert_check_and_lock方法實現

2.2.4 session2嘗試insert

------------------------
LATEST DETECTED DEADLOCK
------------------------
2017-12-21 22:50:40 0x70001028a000
*** (1) TRANSACTION:
TRANSACTION 238434, ACTIVE 81 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executing
insert into number select ‘ddd‘,1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** (2) TRANSACTION:
TRANSACTION 238435, ACTIVE 54 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69159 localhost root executing
insert into number select ‘fff‘,1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx‘s n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479459588792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238434, ACTIVE 84 sec
3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 646464; asc ddd;;
1: len 6; hex 00000003a362; asc b;;
2: len 7; hex de000001e60110; asc ;;
3: len 8; hex 8000000000000001; asc ;;
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

到了這裏,我們可以從死鎖信息中看出,由於事務238435在插入時也發現了事務238434的gap鎖,同樣加上了插入意向鎖,等待事務238434釋放掉gap鎖。因此出現死鎖的情況。

2.3 死鎖的避免

我們已經知道,這種情況出現的原因是:兩個session同時通過select for update,並且未命中任何記錄的情況下,是有可能得到相同gap的鎖的(看where篩選條件)。此時再進行並發插入,其中一個會進入鎖等待,待第二個session進行插入時,會出現死鎖。MySQL會根據事務權重選擇一個事務進行回滾。

那麽如何避免這個情況呢?
一種解決辦法是將事務隔離級別降低到Read Committed,這時不會有gap鎖,對於上述場景,其中某個session會出現索引沖突,可在業務代碼中捕獲進行重試。
此外,上面代碼示例中的代碼還有一處值得註意的地方是事務註解@Transactional的傳播機制,對於這類與主流程事務關系不大的方法,不妨將事務傳播行為改為REQUIRES_NEW。否則某個線程在執行獲取流水號的時候可能會因為另一個線程的主流程業務還沒執行完畢而阻塞。

3.參考

InnoDB手冊
數據庫內核月報 - 2016 / 01
InnoDB源碼

select for update引發死鎖分析