Postgres中的SpinLock鎖

Spinlock · 發表 2018-10-11 20:44:00

我們知道，在資料庫中為了併發控制，少不了要使用各種各樣的鎖(lock)。PostgreSQL中也不例外。

在PostgreSQL中有三種級別的鎖，他們的關係如下：

|上層RegularLock
|
|LWLock
|
|底層SpinLock
v

那麼按照順序，我們先來討論下PostgreSQL的最底層的SpinLock。

作為PostgreSQL的最底層的鎖，SpinLock比較簡單，它的特點是封鎖時間很短，沒有等待佇列和死鎖檢測機制，在事務結束時不能自動釋放。因此，SpinLock一般不單獨使用，而是作為其他鎖(LWLock)的底層實現。

作為最底層鎖，它的實現是和作業系統和硬體環境相關的。為此，PostgreSQL實現了兩個SpinLock：

與機器相關的實現，利用TAS指令集實現(定義在s_lock.h和s_lock.c中);
與機器無關，利用PostgreSQL定義的訊號量PGSemaphore實現(定義在spin.c中)。

很顯然，依賴機器實現的SpinLock一定比不依賴機器實現的SpinLock要快。因此，如果PostgreSQL執行的機器上如果支援TAS指令集，那麼自然會採用第一種實現，否則只能使用第二種實現了。

機器相關的實現

我們，知道與機器相關的實現利用了TAS指令集。那麼什麼是TAS呢？

TAS是 Test and Set的縮寫。是一個原子操作。它修改記憶體的值，並返回原來的值。當一個程序P1對一個記憶體位置做TAS操作，不允許其它程序P2對此記憶體位置再做TAS操作。P2必須等P1操作完成後，再做TAS操作。因此，該操作被用來實現程序互斥。

有了這個概念，我們來看原始碼。

程式碼在：

src/include/storage/s_lock.h
src/backend/storage/lmgr/s_lock.c

雖然說了對於SpinLock有兩個底層實現，但是在上層呼叫時，我們是使用統一的介面的，介面在src/backend/storage/lmgr/s_lock.c中：

/*
 * s_lock(lock) - platform-independent portion of waiting for a spinlock.
 */
int
s_lock(volatile slock_t *lock, const char *file, int line)
{
...

while (TAS_SPIN(lock))//呼叫點
{

... 

}

可以發現這個TAS_SPIN(lock)是一個巨集，

#define TAS_SPIN(lock)TAS(lock)

當使用基於TAS指令集的鎖時，有：

#define TAS(lock) tas(lock)

對機器的TAS的使用在函式tas()中。

static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;

/*
* Use a non-locking test before asserting the bus lock.Note that the
* extra test appears to be a small loss on some x86 platforms and a small
* win on others; it's by no means clear that we should keep it.
*
* When this was last tested, we didn't have separate TAS() and TAS_SPIN()
* macros.Nowadays it probably would be better to do a non-locking test
* in TAS_SPIN() but not in TAS(), like on x86_64, but no-one's done the
* testing to verify that.Without some empirical evidence, better to
* leave it alone.
*/
__asm__ __volatile__(
"cmpb$0,%1\n"
"jne1f\n"
"lock\n"
"xchgb%0,%1\n"
"1: \n"
:"+q"(_res), "+m"(*lock)
:/* no inputs */
:"memory", "cc");
return (int) _res;
}

可以看到這段在C語言中的內嵌彙編程式碼即是呼叫了機器的TAS指令。假設lock原來的值為“0”，當P1去做申請lock時，能獲取得到鎖。而此時P2再去申請鎖時，必須spin，因為此時lock的值已經被P1修改為“1”了。

用TAS來實現spin lock,此處要注意volatile的使用。volatile表示這個變數是易失的，所以會編譯器會每次都去記憶體中取原始值，而不是直接拿暫存器中的值。

這避免了在多執行緒程式設計中，由於多個執行緒更新同一個變更，記憶體中和暫存器中值的不同步而導致變數的值錯亂的問題。另外，也會影響編譯器的優化行為。

具體彙編程式碼的解析，可以檢視相關資料。

在使用時，PostgreSQL不直接呼叫tas()函式，而是通過：

int s_lock(volatile slock_t *lock, const char *file, int line, const char *func);

來申請spin lock。返回值是等待的時間。

機器無關的實現

如果機器上沒有TAS指令集，那麼PostgreSQL利用PGSemaphores來實現SpinLock。

PGSemaphore是使用OS底層的semaphore來實現的，PG對其做了封裝，提供了PG系統內部統一的semaphore操作介面。PG的用PGSemaphore結構體表示PG自身的semaphore訊號，並將相關操作封裝在sembuf中，傳遞給底層OS。

實現程式碼在：

src/backend/storage/lmgr/spin.c

我們知道這個TAS_SPIN(lock)是SpinLock的抽象定義：

#define TAS_SPIN(lock)TAS(lock)

在不使用TAS的場合，有：

#define TAS(lock)tas_sema(lock)

即呼叫tas_sema(lock)函式實現SpinLock：

int
tas_sema(volatile slock_t *lock)
{
/* Note that TAS macros return 0 if *success* */
return !PGSemaphoreTryLock(&SpinlockSemaArray[*lock]);
}

而PGSemaphoreTryLock的定義為：

bool
PGSemaphoreTryLock(PGSemaphore sema)
{
interrStatus;
struct sembuf sops;//重要！！！

sops.sem_op = -1;/* decrement */
sops.sem_flg = IPC_NOWAIT;/* but don't block */
sops.sem_num = sema->semNum;

/*
* Note: if errStatus is -1 and errno == EINTR then it means we returned
* from the operation prematurely because we were sent a signal.So we
* try and lock the semaphore again.
*/
do
{
errStatus = semop(sema->semId, &sops, 1);
} while (errStatus < 0 && errno == EINTR);

...

即呼叫了PGSemaphores來實現SpinLock。

而PGSemaphores的定義為：

typedef struct PGSemaphoreData
{
intsemId;/* semaphore set identifier */
intsemNum;/* semaphore number within set */
} PGSemaphoreData;

在OS下，我們有：

struct sembuf
{
unsigned short int sem_num; /* semaphore number */
short int sem_op; /* semaphore operation */
short int sem_flg; /* operation flag */
};

PGSemaphoreTryLock中的while迴圈裡就是執行了semop操作。

而這些操作是OS自帶的操作(在<sys/sem.h>標頭檔案中)：

extern int semop(int __semid, struct sembuf *opsptr, size_t nops);

很明顯，PostgreSQL封裝了OS底層的semaphore，然後利用OS底層的系統函式來操作。

共通的操作

SpinLock是分兩種情況來分別實現的。這是它們的不同，在Spinlock之上有一些共通的操作要說明下。對於SpinLock的獲取，並不是每次都成功，當嘗試獲取時發現一個物件已經被lock時，當前執行緒不會阻塞在改鎖上，而是先spin(自旋)一定的次數之後再sleep一定的時間後嘗試再次獲取。對於每次spin之後的sleep時間，PostgreSQL使用了自適應演算法，來決定spin的次數和每次spin後，sleep的時間。

下面函式要注意下：

spins_per_delay()

該函式計算spin多少次後，開始sleep。預設為100，最大值為1000，最小值為10。

spins_per_delay的值基本上不變；但是cur_delay的值為當前值1倍和2倍之間變動。因此，spin delay次數越多，sleep時間會越長。

還有一個變數：

cur_delay

當前sleep的時間，最大值為1000，最小值為1。單位為ms。

小結

本文討論了關於PostgreSQL的SpinLock實現以及相關函式。SpinLock是PostgreSQL的最底層的鎖，它的主要作用是為上層的鎖提供支援。本文SpinLock就聊到這裡，下次我們來聊PostGreSQL的LWLock和RegularLock。

注：本文還參考了ofollow,noindex" target="_blank">這篇文章 ,在此表示感謝。

Postgres中的SpinLock鎖

機器相關的實現

機器無關的實現

共通的操作

小結

您可能也會喜歡…