1. 程式人生 > >深入理解Redis高可用方案-Sentinel

深入理解Redis高可用方案-Sentinel

Redis Sentinel是Redis的高可用方案。是Redis 2.8中正式引入的。

在之前的主從複製方案中,如果主節點出現問題,需要手動將一個從節點升級為主節點,然後將其它從節點指向新的主節點,並且需要修改應用方主節點的地址。整個過程都需要人工干預。

下面通過日誌具體看看Sentinel的切換流程。

Sentinel的切換流程

叢集拓撲圖如下。

角色                 IP              埠           runID

主節點             127.0.0.1   6379

從節點-1          127.0.0.1   6380 

從節點-2          127.0.0.1   6381

Sentinel-1        127.0.0.1   26379    d4424b8684977767be4f5abd1e364153fbb0adbd

Sentinel-2        127.0.0.1   26380    18311edfbfb7bf89fe4b67d08ef432053db62fff

Sentinel-3        127.0.0.1   26381    3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8

kill -9 將主節點程序殺死。

1. 最先反應的是從節點。

其會馬上輸出如下資訊。

28244:S 08 Oct 16:03:34.184 # Connection with master lost.
28244:S 08 Oct 16:03:34.184 * Caching the disconnected master state.
28244:S 08 Oct 16:03:34.548 * Connecting to MASTER 127.0.0.1:6379
28244:S 08 Oct 16:03:34.548 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:03:34.548 # Error condition on socket for
SYNC: Connection refused 28244:S 08 Oct 16:03:35.556 * Connecting to MASTER 127.0.0.1:6379 28244:S 08 Oct 16:03:35.556 * MASTER <-> SLAVE sync started ...

2. Sentinel的日誌30s後才有輸出,這個與“sentinel down-after-milliseconds mymaster 30000”的設定有關。

下面,依次貼出哨兵各個節點及slave的日誌輸出。

Sentinel-1

28087:X 08 Oct 16:04:04.277 # +sdown master mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:04.379 # +new-epoch 1
28087:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28087:X 08 Oct 16:04:05.388 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28087:X 08 Oct 16:04:05.388 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:35.656 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

Sentinel-2

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.554 # -odown master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

Sentinel-3

28234:X 08 Oct 16:04:04.288 # +sdown master mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:04.378 # +new-epoch 1
28234:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28234:X 08 Oct 16:04:04.385 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28234:X 08 Oct 16:04:05.630 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:05.630 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:35.709 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

slave2

28244:S 08 Oct 16:04:04.762 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:04.762 # Error condition on socket for SYNC: Connection refused
28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=
32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

slave3

28253:S 08 Oct 16:04:03.655 * MASTER <-> SLAVE sync started
28253:S 08 Oct 16:04:03.655 # Error condition on socket for SYNC: Connection refused
28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdc
c5c0203128253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-
free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.
28253:M 08 Oct 16:04:05.770 * Slave 127.0.0.1:6380 asks for synchronization
28253:M 08 Oct 16:04:05.770 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 156 bytes of backlog starting from offset 24302.

結合上面的日誌,可以看到,

各個Sentinel節點都判斷127.0.0.1 6379為主觀下線(Subjectively Down,縮寫為sdown)。

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379

達到quorum的設定,Sentinel-2判斷其為客觀下線(Objectively Down,縮寫為odown)。結合其它兩個Sentinel節點的日誌,可以看到,Sentinel-2最先判定其客觀下線。接下來,會進行Sentinel的領導者選舉。一般來說,誰先完成客觀下線的判定,誰就是領導者,只有Sentinel領導者才能進行failover。

28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379

尋找合適的slave作為master

28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379

+failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable slave for promotion.

將127.0.0.1 6381設定為新主

28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+selected-slave <instance details> -- We found the specified good slave to promote.

命令6381節點執行slaveof no one,使其成為主節點

28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+failover-state-send-slaveof-noone <instance details> -- We are trying to reconfigure the promoted slave as master, waiting for it to switch.

等待6381節點升級為主節點

28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

確認6381節點已經升級為主節點

28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

再來看看16:04:04.528到16:04:05.543這個時間段slave3的日誌輸出。可以看到,其開啟了MASTER模式,且重寫了配置檔案。

28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdcc5c0203128253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.

failover進入重新配置從節點階段

28163:X 08 Oct 16:04:05.543 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

命令6380節點複製新的主節點

28163:X 08 Oct 16:04:05.629 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-sent <instance details> -- The leader sentinel sent the SLAVEOF command to this instance in order to reconfigure it for the new slave.

看看這個時間點slave2的日誌輸出,基本吻合。其進行的是增量同步。

28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

同時,在這個時間點,sentinel也有日誌輸出,以sentinel1為例。從日誌中,可以看到,在這個時間點它會更改配置資訊。

28087:X 08 Oct 16:04:05.631 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28087:X 08 Oct 16:04:05.631 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28087:X 08 Oct 16:04:05.631 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

switch-master <master name> <oldip> <oldport> <newip> <newport> -- The master new IP and address is the specified one after a configuration change. This is the message most external users are interested in.

同步過程尚未完成。

28163:X 08 Oct 16:04:06.555 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-inprog <instance details> -- The slave being reconfigured showed to be a slave of the new master ip:port pair, but the synchronization process is not yet complete.

主從同步完成。

28163:X 08 Oct 16:04:06.555 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

+slave-reconf-done <instance details> -- The slave is now synchronized with the new master.

failover切換完成。

28163:X 08 Oct 16:04:06.606 # +failover-end master mymaster 127.0.0.1 6379

failover成功後,釋出主節點的切換訊息

28163:X 08 Oct 16:04:06.606 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

 關聯新主節點的slave資訊,需要注意的是,原來的主節點會作為新主節點的slave。

28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28163:X 08 Oct 16:04:06.606 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

+slave <instance details> -- A new slave was detected and attached.

過了30s後,判定原來的主節點主觀下線。

28163:X 08 Oct 16:04:36.687 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

綜合來看,Sentinel進行failover的流程如下

1. 每隔1秒,每個Sentinel節點會向主節點、從節點、其餘Sentinel節點發送一條ping命令做一次心跳檢測,來確認這些節點當前是否可達。當這些節點超過down-after-milliseconds沒有進行有效回覆,Sentinel節點就會判定該節點為主觀下線。

2. 如果被判定為主觀下線的節點是主節點,該Sentinel節點會通過sentinel is master-down-by-addr命令向其他Sentinel節點詢問對主節點的判斷,當超過<quorum>個數,Sentinel節點會判定該節點為客觀下線。如果從節點、Sentinel節點被判定為主觀下線,並不會進行後續的故障切換操作。

3. 對Sentinel進行領導者選舉,由其來進行後續的故障切換(failover)工作。選舉演算法基於Raft。

4. Sentinel領導者節點開始進行故障切換。

5. 選擇合適的從節點作為新主節點。

6. Sentinel領導者節點對上一步選出來的從節點執行slaveof no one命令讓其成為主節點。

7. 向剩餘的從節點發送命令,讓它們成為新主節點的從節點,複製規則和parallel-syncs引數有關。

8. 將原來的主節點更新為從節點,並將其納入到Sentinel的管理,讓其恢復後去複製新的主節點。

Sentinel的領導者選舉流程。

Sentinel的領導者選舉基於Raft協議。

1. 每個線上的Sentinel節點都有資格成為領導者,當它確認主節點主觀下線時候,會向其他Sentinel節點發送sentinel is-master-down-by-addr命令,要求將自己設定為領導者。

2. 收到命令的Sentinel節點,如果沒有同意過其他Sentinel節點的sentinel is-master-down-by-addr命令,將同意該請求,否則拒絕。

3. 如果該Sentinel節點發現自己的票數已經大於等於max(quorum,num(sentinels)/2+1),那麼它將成為領導者。

新主節點的選擇流程。

1. 刪除所有已經處於下線或斷線狀態的從節點。

2. 刪除最近5秒沒有回覆過領導者Sentinel的INFO命令的從節點。

3. 刪除所有與已下線主節點連線斷開超過down-after-milliseconds*10毫秒的從節點。

4. 選擇優先順序最高的從節點。

5. 選擇複製偏移量最大的從節點。

6. 選擇runid最小的從節點。 

三個定時監控任務

1. 每隔10秒,每個Sentinel節點會向主節點和從節點發送info命令獲取最新的拓撲結構。其作用如下:

    1> 通過向主節點執行info命令,獲取從節點的資訊,這也是為什麼Sentinel節點不需要顯式配置監控從節點。    2> 當有新的從節點加入時可立刻感知出來。    3> 節點不可達或者故障切換後,可通過info命令實時更新節點拓撲資訊。

2. 每隔2秒,每個Sentinel節點會向Redis資料節點的__sentinel__:hello頻道上傳送該Sentinel節點對於主節點的判斷以及當前Sentinel節點的資訊,同時每個Sentinel節點也會訂閱該頻道,來了解其它Sentinel節點以及它們對主節點的判斷。其作用如下:

   1> 發現新的Sentinel節點:通過訂閱主節點的__sentinel__:hello瞭解其它Sentinel節點資訊,如果是新加入的Sentinel節點,將該Sentinel節點資訊儲存起來,並與該Sentinel節點建立連線。   2> Sentinel節點之間交換主節點的狀態,作為後面客觀下線以及領導者選舉的依據。

3. 每隔1秒,每個Sentinel節點會向主節點、從節點、其餘Sentinel節點發送一條ping命令做一次心跳檢測,來確認這些節點當前是否可達。這個定時任務是節點失敗判定的重要依據。

Sentinel的相關引數

# bind 127.0.0.1 192.168.1.1
# protected-mode no
port 26379
# sentinel announce-ip <ip>
# sentinel announce-port <port>
dir /tmp
sentinel monitor mymaster 127.0.0.1 6379 2
# sentinel auth-pass <master-name> <password>
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
# sentinel notification-script mymaster /var/redis/notify.sh
# sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
sentinel deny-scripts-reconfig yes

其中,

dir:設定Sentinel的工作目錄。

sentinel monitor mymaster 127.0.0.1 6379 2:其中2是quorum,即權重,代表至少需要兩個Sentinel節點認為主節點主觀下線,才可判定主節點為客觀下線。一般建議將其設定為Sentinel節點的一半加1。不僅如此,quorum還與Sentinel節點的領導者選舉有關。為了選出Sentinel的領導者,至少需要max(quorum, num(sentinels) / 2 + 1)個Sentinel節點參與選舉。

sentinel down-after-milliseconds mymaster 30000:每個Sentinel節點都要通過定期傳送ping命令來判斷Redis節點和其餘Sentinel節點是否可達。

如果在指定的時間內,沒有收到主節點的有效回覆,則判斷其為主觀下線。需要注意的是,該引數不僅用來判斷主節點狀態,同樣也用來判斷該主節點下面的從節點及其它Sentinel的狀態。其預設值為30s。

sentinel parallel-syncs mymaster 1:在failover期間,允許多少個slave同時指向新的主節點。如果numslaves設定較大的話,雖然複製操作並不會阻塞主節點,但多個節點同時指向新的主節點,會增加主節點的網路和磁碟IO負載。

sentinel failover-timeout mymaster 180000:定義故障切換超時時間。預設180000,單位秒,即3min。需要注意的是,該時間不是總的故障切換的時間,而是適用於故障切換的多個場景。

# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
#   already tried against the same master by a given Sentinel, is two
#   times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
#   to a Sentinel current configuration, to be forced to replicate
#   with the right master, is exactly the failover timeout (counting since
#   the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
#   did not produced any configuration change (SLAVEOF NO ONE yet not
#   acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
#   reconfigured as slaves of the new master. However even after this time
#   the slaves will be reconfigured by the Sentinels anyway, but not with
#   the exact parallel-syncs progression as specified.

第一種適用場景:如果Redis Sentinel對一個主節點故障切換失敗,那麼下次再對該主節點做故障切換的起始時間是failover-timeout的2倍。這點從Sentinel的日誌就可體現出來(28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018)

sentinel notification-script:定義通知指令碼,當Sentinel出現WARNING級別的事件時,會呼叫該指令碼,其會傳入兩個引數:事件型別,事件描述。

sentinel client-reconfig-script:當主節點發生切換時,會呼叫該引數定義的指令碼,其會傳入以下引數:<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>

關於指令碼,其必須遵循一定的規則。

# SCRIPTS EXECUTION
#
# sentinel notification-script and sentinel reconfig-script are used in order
# to configure scripts that are called to notify the system administrator
# or to reconfigure clients after a failover. The scripts are executed
# with the following rules for error handling:
#
# If script exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If script exits with "2" (or an higher value) the script execution is
# not retried.
#
# If script terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A script has a maximum running time of 60 seconds. After this limit is
# reached the script is terminated with a SIGKILL and the execution retried.

sentinel deny-scripts-reconfig:不允許使用SENTINEL SET設定notification-script和client-reconfig-script。

Sentinel的常見操作

  • PING This command simply returns PONG.
  • SENTINEL masters Show a list of monitored masters and their state.
  • SENTINEL master <master name> Show the state and info of the specified master.
  • SENTINEL slaves <master name> Show a list of slaves for this master, and their state.
  • SENTINEL sentinels <master name> Show a list of sentinel instances for this master, and their state.
  • SENTINEL get-master-addr-by-name <master name> Return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the pr