1. 程式人生 > >Redis 哨兵模式的原始碼(轉載)

Redis 哨兵模式的原始碼(轉載)

建議閱讀:
1、Sentinel的理論部分見:

I、上帝視角

1、Sentinel也是Redis伺服器,只是與普通伺服器職責不同,其負責監視Redis伺服器,以提高伺服器叢集的可靠性。Sentinel與普通伺服器共用一套框架(網路框架,底層資料結構,訂閱與釋出機制),但又有其獨立的執行程式碼。

為維護Sentinel系統的正常執行,我們先來看Redis為Sentinel維護了怎樣的資料結構:

/* Main state. */
/* Sentinel 的狀態結構 */
/*src/sentinel.csentinelState*/
struct sentinelState {

    // 當前紀元
    uint64_t current_epoch;     /* Current epoch. */

    // 儲存了所有被這個 sentinel 監視的主伺服器
    // 字典的key是主伺服器的名字
    // 字典的value則是一個指向 sentinelRedisInstance 結構的指標
    dict *masters;      /* Dictionary of master sentinelRedisInstances.
                           Key is the instance name, value is the
                           sentinelRedisInstance structure pointer. */

    // 是否進入了 TILT 模式?
    int tilt;           /* Are we in TILT mode? */

    // 目前正在執行的指令碼的數量
    int running_scripts;    /* Number of scripts in execution right now. */

    // 進入 TILT 模式的時間
    mstime_t tilt_start_time;   /* When TITL started. */

    // 最後一次執行時間處理器的時間
    mstime_t previous_time;     /* Last time we ran the time handler. */

    // 一個 FIFO 佇列,包含了所有需要執行的使用者指令碼
    list *scripts_queue;    /* Queue of user scripts to execute. */

} sentinel;  

2、從主函式main中可以看到伺服器是如何向Sentinel轉化的:

/*src/redis.c/main*/
int main(int argc, char **argv) {
    
    // 隨機種子,一般rand() 產生隨機數的函式會用到
    srand(time(NULL)^getpid());
    gettimeofday(&tv,NULL);
    dictSetHashFunctionSeed(tv.tv_sec^tv.tv_usec^getpid());
    // 通過命令列引數確認是否啟動哨兵模式
    server.sentinel_mode = checkForSentinelMode(argc,argv);
    // 初始化伺服器配置,主要是填充redisServer 結構體中的各種引數
    initServerConfig();
    // 將伺服器配置為哨兵模式,與普通的redis 伺服器不同
    /* We need to init sentinel right now as parsing the configuration file
    * in sentinel mode will have the effect of populating the sentinel
    * data structures with master nodes to monitor. */
    if (server.sentinel_mode) {
        // initSentinelConfig() 只指定哨兵伺服器的埠
        initSentinelConfig();
        initSentinel();
    }
    ......
    // 普通redis 伺服器模式
    if (!server.sentinel_mode) {
    ......
    // 哨兵伺服器模式
    } else {
    // 檢測哨兵模式是否正常配置
    sentinelIsRunning();
    }
    ......
    // 進入事件迴圈
    aeMain(server.el);
    // 去除事件迴圈系統
    aeDeleteEventLoop(server.el);
    return 0;
}  

II、Sentinel的初始化

1、在上面的程式中,可以看出,如果檢查到需要使用Sentinel模式時,會呼叫initSentinel函式對Sentinel伺服器進行特有的初始化:

/* Perform the Sentinel mode initialization. */
// 以 Sentinel 模式初始化伺服器
/*src/sentinel.c/initSentinel*/
void initSentinel(void) {
    int j;

    /* Remove usual Redis commands from the command table, then just add
     * the SENTINEL command. */

    // 清空 Redis 伺服器的命令表(該表用於普通模式)
    dictEmpty(server.commands,NULL);
    // 將 SENTINEL 模式所用的命令新增進命令表
    for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) {
        int retval;
        struct redisCommand *cmd = sentinelcmds+j;

        retval = dictAdd(server.commands, sdsnew(cmd->name), cmd);
        redisAssert(retval == DICT_OK);
    }

    /* Initialize various data structures. */
    /* 初始化 Sentinel 的狀態 */
    // 初始化紀元
    sentinel.current_epoch = 0;

    // 初始化儲存主伺服器資訊的字典
    sentinel.masters = dictCreate(&instancesDictType,NULL);

    // 初始化 TILT 模式的相關選項
    sentinel.tilt = 0;
    sentinel.tilt_start_time = 0;
    sentinel.previous_time = mstime();

    // 初始化指令碼相關選項
    sentinel.running_scripts = 0;
    sentinel.scripts_queue = listCreate();
}  

2、為了能讓Sentinel自動管理Redis伺服器,在serverCorn函式中添加了一個定時程式:

/*src/redis.c/serverCorn*/
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ......
    run_with_period(100) {
        //sentinelTimer即為sentinel主函式
         if (server.sentinel_mode) sentinelTimer();
    }
}  

III、Sentinel主函式:sentinelTimer

sentinelTimer所做的工作包括:監視普通Redis伺服器,執行故障轉移,執行指令碼命令。

// sentinel 模式的主函式,由 redis.c/serverCron 函式呼叫
/*src/sentinel.c/sentinelTimer*/
void sentinelTimer(void) {

    // 記錄本次 sentinel 呼叫的事件,
    // 並判斷是否需要進入 TITL 模式
    sentinelCheckTiltCondition();

    // 執行定期操作
    // 比如 PING 例項、分析主伺服器和從伺服器的 INFO 命令
    // 向其他監視相同主伺服器的 sentinel 傳送問候資訊
    // 並接收其他 sentinel 發來的問候資訊
    // 執行故障轉移操作,等等
    sentinelHandleDictOfRedisInstances(sentinel.masters);

    // 執行等待執行的指令碼
    sentinelRunPendingScripts();

    // 清理已執行完畢的指令碼,並重試出錯的指令碼
    sentinelCollectTerminatedScripts();

    // 殺死執行超時的指令碼
    sentinelKillTimedoutScripts();

    /* We continuously change the frequency of the Redis "timer interrupt"
     * in order to desynchronize every Sentinel from every other.
     * This non-determinism avoids that Sentinels started at the same time
     * exactly continue to stay synchronized asking to be voted at the
     * same time again and again (resulting in nobody likely winning the
     * election because of split brain voting). */
    server.hz = REDIS_DEFAULT_HZ + rand() % REDIS_DEFAULT_HZ;
}

IV、Sentinel與Redis伺服器的連線

1、每個Sentinel都可以與多個Redis伺服器連線,其為每個Redis伺服器都維護了一個struct sentinelRedisInstance

// Sentinel 會為每個被監視的 Redis 例項建立相應的 sentinelRedisInstance 例項
// (被監視的例項可以是主伺服器、從伺服器、或者其他 Sentinel )
typedef struct sentinelRedisInstance {
    ......
    /* Master specific. */
    // 其他正在監視此主機的哨兵
    dict *sentinels; /* Other sentinels monitoring the same master. */
    // 次主機的從機列表
    dict *slaves; /* Slaves for this master instance. */
    ......
    // 如果是從機,master 則指向它的主機
    struct sentinelRedisInstance *master; /* Master instance if it's slave. */
    ......
} sentinelRedisInstance;

可見,Sentinel可監視的例項可以是主伺服器,從伺服器,或者其他Sentinel,下圖表示了一個完整的sentinel.masters結構:

2、Sentinel要想對某個Redis伺服器進行監視,則首先要做的就是先對Redis伺服器進行連線,在連線之前需要完成配置工作(如IP,port)

假如需要對一個Redis伺服器進行監視,則需要在配置檔案中寫入:
sentinel monitor <master-name> <ip> <redis-port> <quorum>

上述命令中quorum引數是Sentinel用來判斷Redis伺服器是否下線的引數,對以上命令的解析與配置是通過呼叫函式sentinelHandleConfiguration完成的:

// 哨兵配置檔案解析和處理
/*src/sentinel.c/sentinelHandleConfiguration*/
char *sentinelHandleConfiguration(char **argv, int argc) {
    sentinelRedisInstance *ri;
    if (!strcasecmp(argv[0],"monitor") && argc == 5) {
        /* monitor <name> <host> <port> <quorum> */
        int quorum = atoi(argv[4]);
        // quorum >= 0
    if (quorum <= 0) return "Quorum must be 1 or greater.";
    if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
        atoi(argv[3]),quorum,NULL) == NULL)
    {
    switch(errno) {
        case EBUSY: return "Duplicated master name.";
        case ENOENT: return "Can't resolve master instance hostname.";
        case EINVAL: return "Invalid port number";
        }
    }
    ......
}  

sentinelHandleConfiguration主要呼叫了createSentinelRedisInstance函式,這個函式的工作就是初始化sentinelRedisInstance結構體。

/* ========================== sentinelRedisInstance ========================= */

/* Create a redis instance, the following fields must be populated by the
 * caller if needed:
 *
 * 建立一個 Redis 例項,在有需要時,以下兩個域需要從呼叫者提取:
 *
 * runid: set to NULL but will be populated once INFO output is received.
 *        設定為 NULL ,並在接收到 INFO 命令的回覆時設定
 *
 * info_refresh: is set to 0 to mean that we never received INFO so far.
 *               如果這個值為 0 ,那麼表示我們未收到過 INFO 資訊。
 *
 * If SRI_MASTER is set into initial flags the instance is added to
 * sentinel.masters table.
 *
 * 如果 flags 引數為 SRI_MASTER ,
 * 那麼這個例項會被新增到 sentinel.masters 表。
 *
 * if SRI_SLAVE or SRI_SENTINEL is set then 'master' must be not NULL and the
 * instance is added into master->slaves or master->sentinels table.
 *
 * 如果 flags 為 SRI_SLAVE 或者 SRI_SENTINEL ,
 * 那麼 master 引數不能為 NULL ,
 * SRI_SLAVE 型別的例項會被新增到 master->slaves 表中,
 * 而 SRI_SENTINEL 型別的例項則會被新增到 master->sentinels 表中。
 *
 * If the instance is a slave or sentinel, the name parameter is ignored and
 * is created automatically as hostname:port.
 *
 * 如果例項是從伺服器或者 sentinel ,那麼 name 引數會被自動忽略,
 * 例項的名字會被自動設定為 hostname:port 。
 *
 * The function fails if hostname can't be resolved or port is out of range.
 * When this happens NULL is returned and errno is set accordingly to the
 * createSentinelAddr() function.
 *
 * 當 hostname 不能被解釋,或者超出範圍時,函式將失敗。
 * 函式將返回 NULL ,並設定 errno 變數,
 * 具體的出錯值請參考 createSentinelAddr() 函式。
 *
 * The function may also fail and return NULL with errno set to EBUSY if
 * a master or slave with the same name already exists. 
 *
 * 當相同名字的主伺服器或者從伺服器已經存在時,函式返回 NULL ,
 * 並將 errno 設為 EBUSY 。
 */
sentinelRedisInstance *createSentinelRedisInstance(char *name, int flags, char *hostname, int port, int quorum, sentinelRedisInstance *master) {
    sentinelRedisInstance *ri;
    sentinelAddr *addr;
    dict *table = NULL;
    char slavename[128], *sdsname;

    redisAssert(flags & (SRI_MASTER|SRI_SLAVE|SRI_SENTINEL));
    redisAssert((flags & SRI_MASTER) || master != NULL);

    /* Check address validity. */
    // 儲存 IP 地址和埠號到 addr
    addr = createSentinelAddr(hostname,port);
    if (addr == NULL) return NULL;

    /* For slaves and sentinel we use ip:port as name. */
    // 如果例項是從伺服器或者 sentinel ,那麼使用 ip:port 格式為例項設定名字
    if (flags & (SRI_SLAVE|SRI_SENTINEL)) {
        snprintf(slavename,sizeof(slavename),
            strchr(hostname,':') ? "[%s]:%d" : "%s:%d",
            hostname,port);
        name = slavename;
    }

    /* Make sure the entry is not duplicated. This may happen when the same
     * name for a master is used multiple times inside the configuration or
     * if we try to add multiple times a slave or sentinel with same ip/port
     * to a master. */
    // 配置檔案中添加了重複的主伺服器配置
    // 或者嘗試新增一個相同 ip 或者埠號的從伺服器或者 sentinel 時
    // 就可能出現重複新增同一個例項的情況
    // 為了避免這種現象,程式在新增新例項之前,需要先檢查例項是否已存在
    // 只有不存在的例項會被新增

    // 選擇要新增的表
    // 注意主服務會被新增到 sentinel.masters 表
    // 而從伺服器和 sentinel 則會被新增到 master 所屬的 slaves 表和 sentinels 表中
    if (flags & SRI_MASTER) table = sentinel.masters;
    else if (flags & SRI_SLAVE) table = master->slaves;
    else if (flags & SRI_SENTINEL) table = master->sentinels;
    sdsname = sdsnew(name);
    if (dictFind(table,sdsname)) {

        // 例項已存在,函式直接返回

        sdsfree(sdsname);
        errno = EBUSY;
        return NULL;
    }

    /* Create the instance object. */
    // 建立例項物件
    ri = zmalloc(sizeof(*ri));
    /* Note that all the instances are started in the disconnected state,
     * the event loop will take care of connecting them. */
    // 所有連線都已斷線為起始狀態,sentinel 會在需要時自動為它建立連線
    ri->flags = flags | SRI_DISCONNECTED;
    ri->name = sdsname;
    ri->runid = NULL;
    ri->config_epoch = 0;
    ri->addr = addr;
    ri->cc = NULL;
    ri->pc = NULL;
    ri->pending_commands = 0;
    ri->cc_conn_time = 0;
    ri->pc_conn_time = 0;
    ri->pc_last_activity = 0;
    /* We set the last_ping_time to "now" even if we actually don't have yet
     * a connection with the node, nor we sent a ping.
     * This is useful to detect a timeout in case we'll not be able to connect
     * with the node at all. */
    ri->last_ping_time = mstime();
    ri->last_avail_time = mstime();
    ri->last_pong_time = mstime();
    ri->last_pub_time = mstime();
    ri->last_hello_time = mstime();
    ri->last_master_down_reply_time = mstime();
    ri->s_down_since_time = 0;
    ri->o_down_since_time = 0;
    ri->down_after_period = master ? master->down_after_period :
                            SENTINEL_DEFAULT_DOWN_AFTER;
    ri->master_link_down_time = 0;
    ri->auth_pass = NULL;
    ri->slave_priority = SENTINEL_DEFAULT_SLAVE_PRIORITY;
    ri->slave_reconf_sent_time = 0;
    ri->slave_master_host = NULL;
    ri->slave_master_port = 0;
    ri->slave_master_link_status = SENTINEL_MASTER_LINK_STATUS_DOWN;
    ri->slave_repl_offset = 0;
    ri->sentinels = dictCreate(&instancesDictType,NULL);
    ri->quorum = quorum;
    ri->parallel_syncs = SENTINEL_DEFAULT_PARALLEL_SYNCS;
    ri->master = master;
    ri->slaves = dictCreate(&instancesDictType,NULL);
    ri->info_refresh = 0;

    /* Failover state. */
    ri->leader = NULL;
    ri->leader_epoch = 0;
    ri->failover_epoch = 0;
    ri->failover_state = SENTINEL_FAILOVER_STATE_NONE;
    ri->failover_state_change_time = 0;
    ri->failover_start_time = 0;
    ri->failover_timeout = SENTINEL_DEFAULT_FAILOVER_TIMEOUT;
    ri->failover_delay_logged = 0;
    ri->promoted_slave = NULL;
    ri->notification_script = NULL;
    ri->client_reconfig_script = NULL;

    /* Role */
    ri->role_reported = ri->flags & (SRI_MASTER|SRI_SLAVE);
    ri->role_reported_time = mstime();
    ri->slave_conf_change_time = mstime();

    /* Add into the right table. */
    // 將例項新增到適當的表中
    dictAdd(table, ri->name, ri);

    // 返回例項
    return ri;
}  

3、在這裡Sentinel並沒有馬上去連線Redis伺服器,而只是將sentinelRedisInstance.flag狀態標記為了SRI_DISCONNECT,真正的連線工作其實在定時程式中因為無論是主從伺服器之間的連線,還是Sentinel與Redis伺服器之間的連線,要想保持其連線狀態,就需要定期檢查,所以就直接將連線放到了定時程式中統一處理。

呼叫過程如下:
sentinelTimer()->sentinelHandleDictOfRedisInstance()->sentinelHandleRedisInstance()->sentinelReconnectInstance()

sentinelReconnectInstance()函式的作用就是連線標記為SRI_DISCONNECT的伺服器,其對Redis發起了兩種連線:
· 普通連線:用於向主伺服器釋出Sentinel的命令,並接收回復(這裡Sentinel是主伺服器的客戶端)。
· 訂閱與釋出專用連線:用於訂閱主伺服器的__sentinel__:hello頻道。這是因為Redis的釋出與訂閱功能中,被髮布的資訊不會儲存在Redis伺服器裡面,因此,為了不丟失__sentinel__:hello頻道的任何資訊,Sentinel專門用一個連線來接收。

/* Create the async connections for the specified instance if the instance
 * is disconnected. Note that the SRI_DISCONNECTED flag is set even if just
 * one of the two links (commands and pub/sub) is missing. */
// 如果 sentinel 與例項處於斷線(未連線)狀態,那麼建立連向例項的非同步連線。
/*src/sentinel.c/sentinelReconnectInstance*/
void sentinelReconnectInstance(sentinelRedisInstance *ri) {

    // 示例未斷線(已連線),返回
    if (!(ri->flags & SRI_DISCONNECTED)) return;

    /* Commands connection. */
    // 對所有例項建立一個用於傳送 Redis 命令的連線, 包括主伺服器,從伺服器,和其他Sentinel
    if (ri->cc == NULL) {

        // 連線例項
        ri->cc = redisAsyncConnect(ri->addr->ip,ri->addr->port);

        // 連接出錯
        if (ri->cc->err) {
            sentinelEvent(REDIS_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s",
                ri->cc->errstr);
            sentinelKillLink(ri,ri->cc);

        // 連線成功
        } else {
            // 設定連線屬性
            ri->cc_conn_time = mstime();
            ri->cc->data = ri;
            redisAeAttach(server.el,ri->cc);
            // 設定連線 callback
            redisAsyncSetConnectCallback(ri->cc,
                                            sentinelLinkEstablishedCallback);
            // 設定斷線 callback
            redisAsyncSetDisconnectCallback(ri->cc,
                                            sentinelDisconnectCallback);
            // 傳送 AUTH 命令,驗證身份
            sentinelSendAuthIfNeeded(ri,ri->cc);
            sentinelSetClientName(ri,ri->cc,"cmd");

            /* Send a PING ASAP when reconnecting. */
            sentinelSendPing(ri);
        }
    }

    /* Pub / Sub */
    // 對主伺服器和從伺服器,建立一個用於訂閱頻道的連線
    if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && ri->pc == NULL) {

        // 連線例項
        ri->pc = redisAsyncConnect(ri->addr->ip,ri->addr->port);

        // 連接出錯
        if (ri->pc->err) {
            sentinelEvent(REDIS_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s",
                ri->pc->errstr);
            sentinelKillLink(ri,ri->pc);

        // 連線成功
        } else {
            int retval;

            // 設定連線屬性
            ri->pc_conn_time = mstime();
            ri->pc->data = ri;
            redisAeAttach(server.el,ri->pc);
            // 設定連線 callback
            redisAsyncSetConnectCallback(ri->pc,
                                            sentinelLinkEstablishedCallback);
            // 設定斷線 callback
            redisAsyncSetDisconnectCallback(ri->pc,
                                            sentinelDisconnectCallback);
            // 傳送 AUTH 命令,驗證身份
            sentinelSendAuthIfNeeded(ri,ri->pc);

            // 為客戶但設定名字 "pubsub"
            sentinelSetClientName(ri,ri->pc,"pubsub");

            /* Now we subscribe to the Sentinels "Hello" channel. */
            // 傳送 SUBSCRIBE __sentinel__:hello 命令,訂閱頻道
            retval = redisAsyncCommand(ri->pc,
                sentinelReceiveHelloMessages, NULL, "SUBSCRIBE %s",
                    SENTINEL_HELLO_CHANNEL);
            
            // 訂閱出錯,斷開連線
            if (retval != REDIS_OK) {
                /* If we can't subscribe, the Pub/Sub connection is useless
                 * and we can simply disconnect it and try again. */
                sentinelKillLink(ri,ri->pc);
                return;
            }
        }
    }

    /* Clear the DISCONNECTED flags only if we have both the connections
     * (or just the commands connection if this is a sentinel instance). */
    // 如果例項是主伺服器或者從伺服器,那麼當 cc 和 pc 兩個連線都建立成功時,關閉 DISCONNECTED 標識
    // 如果例項是 Sentinel ,那麼當 cc 連線建立成功時,關閉 DISCONNECTED 標識
    if (ri->cc && (ri->flags & SRI_SENTINEL || ri->pc))
        ri->flags &= ~SRI_DISCONNECTED;
}  

4、上述程式碼中可以看出,Sentinel對主從伺服器需要維護兩個連線,而對其他Sentinel只需要維護命令連線,這是因為訂閱連線的作用其實是為了自動發現
一個Sentinel可以通過分析接收到的訂閱頻道資訊來獲知其他Sentinel的存在,並通過傳送頻道資訊來讓其他Sentinel知道自己的存在(將資訊傳送給主從伺服器,主從伺服器釋出資訊,使得所有監視伺服器的Sentinel獲知資訊),所以使用者在使用Sentinel的時候不需要提供各個Sentinel的地址資訊,監視同一個伺服器的多個Sentinel可以自動發現對方,只需要維護一個命令連線進行通訊就足夠了。

V、HELLO

1、從上面的sentinelReconnectInstance中可以看出,Sentinel初始化訂閱連線的時候進行了兩個操作,易格斯想伺服器傳送了HELLO命令,二是註冊了回撥函式sentinelReceiveHelloMessages,這個函式的功能就是處理訂閱頻道的返回值,從而完成自動發現。

2、在定時程式中sentinelTimer()->sentinelHandleDictOfRedisInstance()->sentinelHandleRedisInstance()->SentinelSendPeriodicCommand()中,Sentinel會向伺服器的hello頻道釋出資料,其中由sentinelSendHello函式實現:

/*src/sentinel.c/sentinelSendHello*/
/* Send an "Hello" message via Pub/Sub to the specified 'ri' Redis
 * instance in order to broadcast the current configuraiton for this
 * master, and to advertise the existence of this Sentinel at the same time.
 *
 * 向給定 ri 例項的頻道傳送資訊,
 * 從而傳播關於給定主伺服器的配置,
 * 並向其他 Sentinel 宣告本 Sentinel 的存在。
 *
 * The message has the following format:
 *
 * 傳送資訊的格式如下: 
 *
 * sentinel_ip,sentinel_port,sentinel_runid,current_epoch,
 * master_name,master_ip,master_port,master_config_epoch.
 *
 * Sentinel IP,Sentinel 埠號,Sentinel 的執行 ID,Sentinel 當前的紀元,
 * 主伺服器的名稱,主伺服器的 IP,主伺服器的埠號,主伺服器的配置紀元.
 *
 * Returns REDIS_OK if the PUBLISH was queued correctly, otherwise
 * REDIS_ERR is returned. 
 *
 * PUBLISH 命令成功入隊時返回 REDIS_OK ,
 * 否則返回 REDIS_ERR 。
 */
int sentinelSendHello(sentinelRedisInstance *ri) {
    char ip[REDIS_IP_STR_LEN];
    char payload[REDIS_IP_STR_LEN+1024];
    int retval;

    // 如果例項是主伺服器,那麼使用此例項的資訊
    // 如果例項是從伺服器,那麼使用這個從伺服器的主伺服器的資訊
    sentinelRedisInstance *master = (ri->flags & SRI_MASTER) ? ri : ri->master;

    // 獲取地址資訊
    sentinelAddr *master_addr = sentinelGetCurrentMasterAddress(master);

    /* Try to obtain our own IP address. */
    // 獲取例項自身的地址
    if (anetSockName(ri->cc->c.fd,ip,sizeof(ip),NULL) == -1) return REDIS_ERR;
    if (ri->flags & SRI_DISCONNECTED) return REDIS_ERR;

    /* Format and send the Hello message. */
    // 格式化資訊
    snprintf(payload,sizeof(payload),
        "%s,%d,%s,%llu," /* Info about this sentinel. */
        "%s,%s,%d,%llu", /* Info about current master. */
        ip, server.port, server.runid,
        (unsigned long long) sentinel.current_epoch,
        /* --- */
        master->name,master_addr->ip,master_addr->port,
        (unsigned long long) master->config_epoch);
    
    // 傳送資訊
    retval = redisAsyncCommand(ri->cc,
        sentinelPublishReplyCallback, NULL, "PUBLISH %s %s",
            SENTINEL_HELLO_CHANNEL,payload);

    if (retval != REDIS_OK) return REDIS_ERR;

    ri->pending_commands++;

    return REDIS_OK;
}

2、當Redis收到來自Sentinel的釋出資訊時,就會想所有訂閱hello頻道的Sentinel釋出資料,於是剛才所註冊的回撥函式sentinelReceiveHelloMessage就被呼叫,其主要做了兩方面的工作:

· 發現了其他監視此伺服器的Sentinel;
· 更新配置資訊;

VI、INFO

1、Sentinel會以十秒一次的頻率首先向所監視的主機發送INFO命令:

其呼叫過程如下:
sentinelTimer()->sentinelHandleDictOfRedisInstances()->sentinelHandleRedisInstance()->sentinelSendPeriodicCommands()

這其中,Sentinel同樣做了兩件事,一個是傳送了INFO命令,另一個是註冊了sentinelInfoReplyCallback()回撥函式。

當INFO命令返回時,收到了來自伺服器的回覆(包括主機的相關資訊,以及主機所連線的從伺服器),回撥函式被呼叫,主要是完成對伺服器回覆資訊的處理(這其中包括,主從複製資訊,儲存的鍵值對數量,Sentinel判斷是否下線等),並根據獲取到所的從伺服器資訊實現對從伺服器的監視。這也是Sentinel自動發現的部分。

VII、心跳檢測

1、心跳檢測是判斷兩臺機器是否連線正常的常用手段,接收方在收到心跳包之後,會更新收到心跳的時間,在某個事件點如果檢測到心跳包多久沒有收到(超時),則證明網路狀況不好,或對方很忙,也為接下來的行動提供指導,如延遲所需要進行的後續操作,指導心跳檢測正常。

VIII、線上狀態監測

1、Sentinel根據主觀判斷客觀判斷來完成線上狀態監測:
主觀下線:是根據Sentinel自己觀測某個伺服器的資訊;
客觀下線:是通過綜合所有監測某伺服器的Sentinel的資訊;

這同樣是通過心跳檢測傳送PING實現的。

2、主觀下線判斷

/*src/sentinel.c/sentinelCheckSubjectivelyDown*/
/* Is this instance down from our point of view? */
// 檢查例項是否以下線(從本 Sentinel 的角度來看)
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {

    mstime_t elapsed = 0;

    if (ri->last_ping_time)
        elapsed = mstime() - ri->last_ping_time;

    /* Check if we are in need for a reconnection of one of the 
     * links, because we are detecting low activity.
     *
     * 如果檢測到連線的活躍度(activity)很低,那麼考慮重斷開連線,並進行重連
     *
     * 1) Check if the command link seems connected, was connected not less
     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have a
     *    pending ping for more than half the timeout. */
    // 考慮斷開例項的 cc 連線
    if (ri->cc &&
        (mstime() - ri->cc_conn_time) > SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
        ri->last_ping_time != 0 && /* Ther is a pending ping... */
        /* The pending ping is delayed, and we did not received
         * error replies as well. */
        (mstime() - ri->last_ping_time) > (ri->down_after_period/2) &&
        (mstime() - ri->last_pong_time) > (ri->down_after_period/2))
    {
        sentinelKillLink(ri,ri->cc);
    }

    /* 2) Check if the pubsub link seems connected, was connected not less
     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have no
     *    activity in the Pub/Sub channel for more than
     *    SENTINEL_PUBLISH_PERIOD * 3.
     */
    // 考慮斷開例項的 pc 連線
    if (ri->pc &&
        (mstime() - ri->pc_conn_time) > SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
        (mstime() - ri->pc_last_activity) > (SENTINEL_PUBLISH_PERIOD*3))
    {
        sentinelKillLink(ri,ri->pc);
    }

    /* Update the SDOWN flag. We believe the instance is SDOWN if:
     *
     * 更新 SDOWN 標識。如果以下條件被滿足,那麼 Sentinel 認為例項已下線:
     *
     * 1) It is not replying.
     *    它沒有迴應命令
     * 2) We believe it is a master, it reports to be a slave for enough time
     *    to meet the down_after_period, plus enough time to get two times
     *    INFO report from the instance. 
     *    Sentinel 認為例項是主伺服器,這個伺服器向 Sentinel 報告它將成為從伺服器,
     *    但在超過給定時限之後,伺服器仍然沒有完成這一角色轉換。
     */
    if (elapsed > ri->down_after_period ||
        (ri->flags & SRI_MASTER &&
         ri->role_reported == SRI_SLAVE &&
         mstime() - ri->role_reported_time >
          (ri->down_after_period+SENTINEL_INFO_PERIOD*2)))
    {
        /* Is subjectively down */
        if ((ri->flags & SRI_S_DOWN) == 0) {
            // 傳送事件
            sentinelEvent(REDIS_WARNING,"+sdown",ri,"%@");
            // 記錄進入 SDOWN 狀態的時間
            ri->s_down_since_time = mstime();
            // 開啟 SDOWN 標誌
            ri->flags |= SRI_S_DOWN;
        }
    } else {
        // 移除(可能有的) SDOWN 狀態
        /* Is subjectively up */
        if (ri->flags & SRI_S_DOWN) {
            // 傳送事件
            sentinelEvent(REDIS_WARNING,"-sdown",ri,"%@");
            // 移除相關標誌
            ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT);
        }
    }
}
  

3、客觀下線判斷

/*src/sentinel.c/sentinelCheckObjectiveDown*/
/* Is this instance down according to the configured quorum?
 *
 * 根據給定數量的 Sentinel 投票,判斷例項是否已下線。
 *
 * Note that ODOWN is a weak quorum, it only means that enough Sentinels
 * reported in a given time range that the instance was not reachable.
 *
 * 注意 ODOWN 是一個 weak quorum ,它只意味著有足夠多的 Sentinel 
 * 在**給定的時間範圍內**報告例項不可達。
 *
 * However messages can be delayed so there are no strong guarantees about
 * N instances agreeing at the same time about the down state. 
 *
 * 因為 Sentinel 對例項的檢測資訊可能帶有延遲,
 * 所以實際上 N 個 Sentinel **不可能在同一時間內**判斷主伺服器進入了下線狀態。
 */
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    int quorum = 0, odown = 0;

    // 如果當前 Sentinel 將主伺服器判斷為主觀下線
    // 那麼檢查是否有其他 Sentinel 同意這一判斷
    // 當同意的數量足夠時,將主伺服器判斷為客觀下線
    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */

        // 統計同意的 Sentinel 數量(起始的 1 代表本 Sentinel)
        quorum = 1; /* the current sentinel. */

        /* Count all the other sentinels. */
        // 統計其他認為 master 進入下線狀態的 Sentinel 的數量
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);
                
            // 該 SENTINEL 也認為 master 已下線
            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        
        // 如果投票得出的支援數目大於等於判斷 ODOWN 所需的票數
        // 那麼進入 ODOWN 狀態
        if (quorum >= master->quorum) odown = 1;
    }

    /* Set the flag accordingly to the outcome. */
    if (odown) {

        // master 已 ODOWN

        if ((master->flags & SRI_O_DOWN) == 0) {
            // 傳送事件
            sentinelEvent(REDIS_WARNING,"+odown",master,"%@ #quorum %d/%d",
                quorum, master->quorum);
            // 開啟 ODOWN 標誌
            master->flags |= SRI_O_DOWN;
            // 記錄進入 ODOWN 的時間
            master->o_down_since_time = mstime();
        }
    } else {

        // 未進入 ODOWN

        if (master->flags & SRI_O_DOWN) {

            // 如果 master 曾經進入過 ODOWN 狀態,那麼移除該狀態

            // 傳送事件
            sentinelEvent(REDIS_WARNING,"-odown",master,"%@");
            // 移除 ODOWN 標誌
            master->flags &= ~SRI_O_DOWN;
        }
    }
}
  

IX、故障修復

1、一般在Redis伺服器叢集中,只有主機同時肩負著讀請求和寫請求兩個功能,而從機只負責讀請求(從機的寫是通過主從複製中主機的命令傳播完成的)。所以當主機出現宕幾是需要進行故障修復

同樣是來源於sentinelTimer()定時函式:

sentinelTimer()->sentinelHandleDictOfRedisInstance()->sentinelHandleRedisInstance()->sentinelStartFailoverIfNeeded() & sentinelFailoverStateMachine()

sentinelStartFailoverIfNeed()函式在判斷主機主觀下線之後,決定是否執行古裝轉移操作,sentinelFailoverStateMachine()函式開始執行故障轉移操作:

/*src/sentinel.c/sentinelFailoverStateMachine*/
 // 故障修復狀態機,依據被標記的狀態執行相應的動作
void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
    redisAssert(ri->flags & SRI_MASTER);
    if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;
        switch(ri->failover_state) {
            case SENTINEL_FAILOVER_STATE_WAIT_START:
            sentinelFailoverWaitStart(ri);
            break;
            case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
            sentinelFailoverSelectSlave(ri);
            break;
            case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
            sentinelFailoverSendSlaveOfNoOne(ri);
            break;
            case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
            sentinelFailoverWaitPromotion(ri);
            break;
            case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
            sentinelFailoverReconfNextSlave(ri);
            break;
    }
}

上面的case是Sentinel故障轉移中的六種狀態:

sentinelFailoverStateMachine就是根據這些狀態判斷故障轉移進行到了哪一步從而執行相應的函式,下面我們分別看著六個狀態對應需要完成的工作是什麼。

9.1 WAIT_START

1、當一個主伺服器被判斷為客觀下線時,監視這個主伺服器的各個Sentinel會進行協商,選舉出一個領頭Sentinel,並由領頭Sentinel對主伺服器進行故障轉移操作。

此狀態下呼叫函式sentinelFailoverWaitStart所進行的工作主要是判斷自己是否為領頭Sentinel

// 準備執行故障轉移
/*src/sentinel.c/sentinelFailoverWaitStart*/
void sentinelFailoverWaitStart(sentinelRedisInstance *ri) {
    char *leader;
    int isleader;

    /* Check if we are the leader for the failover epoch. */
    // 獲取給定紀元的領頭 Sentinel
    leader = sentinelGetLeader(ri, ri->failover_epoch);
    // 本 Sentinel 是否為領頭 Sentinel ?
    isleader = leader && strcasecmp(leader,server.runid) == 0;
    sdsfree(leader);

    /* If I'm not the leader, and it is not a forced failover via
     * SENTINEL FAILOVER, then I can't continue with the failover. */
    // 如果本 Sentinel 不是領頭,並且這次故障遷移不是一次強制故障遷移操作
    // 那麼本 Sentinel 不做動作
    if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
        int election_timeout = SENTINEL_ELECTION_TIMEOUT;

        /* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
         * and the configured failover timeout. */
        // 當選的時長(類似於任期)是 SENTINEL_ELECTION_TIMEOUT
        // 和 Sentinel 設定的故障遷移時長之間的較小那個值
        if (election_timeout > ri->failover_timeout)
            election_timeout = ri->failover_timeout;

        /* Abort the failover if I'm not the leader after some time. */
        // Sentinel 的當選時間已過,取消故障轉移計劃
        if (mstime() - ri->failover_start_time > election_timeout) {
            sentinelEvent(REDIS_WARNING,"-failover-abort-not-elected",ri,"%@");
            // 取消故障轉移
            sentinelAbortFailover(ri);
        }
        return;
    }

    // 本 Sentinel 作為領頭,開始執行故障遷移操作...

    sentinelEvent(REDIS_WARNING,"+elected-leader",ri,"%@");

    // 進入選擇從伺服器狀態
    ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
    ri->failover_state_change_time = mstime();

    sentinelEvent(REDIS_WARNING,"+failover-state-select-slave",ri,"%@");
}

如果是領頭Sentinel則將狀態更新為SELECT_SLAVE。

9.2 SELECT_SLAVE

這個狀態即為選取從伺服器作為新的主伺服器:

// 選擇合適的從伺服器作為新的主伺服器
void sentinelFailoverSelectSlave(sentinelRedisInstance *ri) {

    // 在舊主伺服器所屬的從伺服器中,選擇新伺服器
    sentinelRedisInstance *slave = sentinelSelectSlave(ri);

    /* We don't handle the timeout in this state as the function aborts
     * the failover or go forward in the next state. */
    // 沒有合適的從伺服器,直接終止故障轉移操作
    if (slave == NULL) {

        // 沒有可用的從伺服器可以提升為新主伺服器,故障轉移操作無法執行
        sentinelEvent(REDIS_WARNING,"-failover-abort-no-good-slave",ri,"%@");

        // 中止故障轉移
        sentinelAbortFailover(ri);

    } else {

        // 成功選定新主伺服器

        // 傳送事件
        sentinelEvent(REDIS_WARNING,"+selected-slave",slave,"%@");

        // 開啟例項的升級標記
        slave->flags |= SRI_PROMOTED;

        // 記錄被選中的從伺服器
        ri->promoted_slave = slave;

        // 更新故障轉移狀態
        ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;

        // 更新狀態改變時間
        ri->failover_state_change_time = mstime();

        // 傳送事件
        sentinelEvent(REDIS_NOTICE,"+failover-state-send-slaveof-noone",
            slave, "%@");
    }
}

此時狀態更新為SLAVEOF_NOONE。

9.3 SLAVEOF_NOONE

此狀態的工作是向選出來的新的主伺服器傳送SLAVEOF no one命令,使其成為真正的主伺服器:

// 向被選中的從伺服器傳送 SLAVEOF no one 命令
// 將它升級為新的主伺服器
/*src/sentinel.c/sentinelFailoverSendSlaveOfNoOne*/
void sentinelFailoverSendSlaveOfNoOne(sentinelRedisInstance *ri) {
    int retval;

    /* We can't send the command to the promoted slave if it is now
     * disconnected. Retry again and again with this state until the timeout
     * is reached, then abort the failover. */
    // 如果選中的從伺服器斷線了,那麼在給定的時間內重試
    // 如果給定時間內選中的從伺服器也沒有上線,那麼終止故障遷移操作
    // (一般來說出現這種情況的機會很小,因為在選擇新的主伺服器時,
    // 已經斷線的從伺服器是不會被選中的,所以這種情況只會出現在
    // 從伺服器被選中,並且傳送 SLAVEOF NO ONE 命令之前的這段時間內)
    if (ri->promoted_slave->flags & SRI_DISCONNECTED) {

        // 如果超過時限,就不再重試
        if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
            sentinelEvent(REDIS_WARNING,"-failover-abort-slave-timeout",ri,"%@");
            sentinelAbortFailover(ri);
        }
        return;
    }

    /* Send SLAVEOF NO ONE command to turn the slave into a master.
     *
     * 向被升級的從伺服器傳送 SLAVEOF NO ONE 命令,將它變為一個主伺服器。
     *
     * We actually register a generic callback for this command as we don't
     * really care about the reply. We check if it worked indirectly observing
     * if INFO returns a different role (master instead of slave). 
     *
     * 這裡沒有為命令回覆關聯一個回撥函式,因為從伺服器是否已經轉變為主伺服器可以
     * 通過向從伺服器傳送 INFO 命令來確認
     */
    retval = sentinelSendSlaveOf(ri->promoted_slave,NULL,0);
    if (retval != REDIS_OK) return;
    sentinelEvent(REDIS_NOTICE, "+failover-state-wait-promotion",
        ri->promoted_slave,"%@");

    // 更新狀態
    // 這個狀態會讓 Sentinel 等待被選中的從伺服器升級為主伺服器
    ri->failover_state = SENTINEL_FAILOVER_STATE_WAIT_PROMOTION;

    // 更新狀態改變的時間
    ri->failover_state_change_time = mstime();
}   

9.4 WAIT_PROMOTION

負責檢查時限,呼叫函式sentinelFailoverWaitPromotion只做了超時判斷,如果超時則停止故障修復:

/* We actually wait for promotion indirectly checking with INFO when the
 * slave turns into a master. */
// Sentinel 會通過 INFO 命令的回覆檢查從伺服器是否已經轉變為主伺服器
// 這裡只負責檢查時限
/*src/sentinel.c/sentinelFailoverWaitPromotion*/
void sentinelFailoverWaitPromotion(sentinelRedisInstance *ri) {
    /* Just handle the timeout. Switching to the next state is handled
     * by the function parsing the INFO command of the promoted slave. */
    if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
        sentinelEvent(REDIS_WARNING,"-failover-abort-slave-timeout",ri,"%@");
        sentinelAbortFailover(ri);
    }
}  

9.5 RECONF_SLAVE

主要做的是向其他候選從伺服器傳送slaveof promote_slave,使其成為他們的主機:

/* Send SLAVE OF <new master address> to all the remaining slaves that
 * still don't appear to have the configuration updated. */
// 向所有尚未同步新主伺服器的從伺服器傳送 SLAVEOF <new-master-address> 命令
void sentinelFailoverReconfNextSlave(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    int in_progress = 0;

    // 計算正在同步新主伺服器的從伺服器數量
    di = dictGetIterator(master->slaves);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *slave = dictGetVal(de);

        // SLAVEOF 命令已傳送,或者同步正在進行
        if (slave->flags & (SRI_RECONF_SENT|SRI_RECONF_INPROG))
            in_progress++;
    }
    dictReleaseIterator(di);

    // 如果正在同步的從伺服器的數量少於 parallel-syncs 選項的值
    // 那麼繼續遍歷從伺服器,並讓從伺服器對新主伺服器進行同步
    di = dictGetIterator(master->slaves);
    while(in_progress < master->parallel_syncs &&
          (de = dictNext(di)) != NULL)
    {
        sentinelRedisInstance *slave = dictGetVal(de);
        int retval;

        /* Skip the promoted slave, and already configured slaves. */
        // 跳過新主伺服器,以及已經完成了同步的從伺服器
        if (slave->flags & (SRI_PROMOTED|SRI_RECONF_DONE)) continue;

        /* If too much time elapsed without the slave moving forward to
         * the next state, consider it reconfigured even if it is not.
         * Sentinels will detect the slave as misconfigured and fix its
         * configuration later. */
        if ((slave->flags & SRI_RECONF_SENT) &&
            (mstime() - slave->slave_reconf_sent_time) >
            SENTINEL_SLAVE_RECONF_TIMEOUT)
        {
            // 傳送重拾同步事件
            sentinelEvent(REDIS_NOTICE,"-slave-reconf-sent-timeout",slave,"%@");
            // 清除已傳送 SLAVEOF 命令的標記
            slave->flags &= ~SRI_RECONF_SENT;
            slave->flags |= SRI_RECONF_DONE;
        }

        /* Nothing to do for instances that are disconnected or already
         * in RECONF_SENT state. */
        // 如果已向從伺服器傳送 SLAVEOF 命令,或者同步正在進行
        // 又或者從伺服器已斷線,那麼略過該伺服器
        if (slave->flags & (SRI_DISCONNECTED|SRI_RECONF_SENT|SRI_RECONF_INPROG))
            continue;

        /* Send SLAVEOF <new master>. */
        // 向從伺服器傳送 SLAVEOF 命令,讓它同步新主伺服器
        retval = sentinelSendSlaveOf(slave,
                master->promoted_slave->addr->ip,
                master->promoted_slave->addr->port);
        if (retval == REDIS_OK) {

            // 將狀態改為 SLAVEOF 命令已傳送
            slave->flags |= SRI_RECONF_SENT;
            // 更新發送 SLAVEOF 命令的時間
            slave->slave_reconf_sent_time = mstime();
            sentinelEvent(REDIS_NOTICE,"+slave-reconf-sent",slave,"%@");
            // 增加當前正在同步的從伺服器的數量
            in_progress++;
        }
    }
    dictReleaseIterator(di);

    /* Check if all the slaves are reconfigured and handle timeout. */
    // 判斷是否所有從伺服器的同步都已經完成
    sentinelFailoverDetectEnd(master);
}

9.6 UPDATE_CONFIG

故障轉移結束後,將進入這一狀態,會呼叫sentinelFailoverSwitchToPromotedSlave函式,將之前的下線master移除master表格,並由新的主伺服器代替:

/* This function is called when the slave is in
 * SENTINEL_FAILOVER_STATE_UPDATE_CONFIG state. In this state we need
 * to remove it from the master table and add the promoted slave instead. */
// 這個函式在 master 已下線,並且對這個 master 的故障遷移操作已經完成時呼叫
// 這個 master 會被移除出 master 表格,並由新的主伺服器代替
void sentinelFailoverSwitchToPromotedSlave(sentinelRedisInstance *master) {

    /// 選出要新增的 master
    sentinelRedisInstance *ref = master->promoted_slave ?
                                 master->promoted_slave : master;

    // 傳送更新 master 事件
    sentinelEvent(REDIS_WARNING,"+switch-master",master,"%s %s %d %s %d",
        // 原 master 資訊
        master->name, master->addr->ip, master->addr->port,
        // 新 master 資訊
        ref->addr->ip, ref->addr->port);

    // 用新主伺服器的資訊代替原 master 的資訊
    sentinelResetMasterAndChangeAddress(master,ref->addr->ip,ref->addr->port);
}  

至此,故障轉移操作完成。

【參考】
[1] 《Redis設計與實現》
[2] 《Redis原始碼日誌》



作者:wenmingxing
連結:https://www.jianshu.com/p/ec837cd18faf
來源:簡書
簡書著作權歸作者所有,任何形式的轉載都請聯絡作者獲得授權並註明出處。