galera mariadb集群恢復策略

阿新 • • 發佈：2018-08-05

new 模塊 file 創建 nodes code spl emd problem

1 galera mariadb
首先MariaDB是一個數據庫，可以看成是MySQL的一個分支，由於MySQL被SUN收購，所以MySQL面臨著閉源的風險，當時MySQL之父Widenius並沒有加入SUN，而是基於MySQL的代碼開發新的分支，命名為MariaDB，並全部開源。

Galera是Galera Cluster，是一種為數據庫設計的新型的、數據不共享的、高度冗余的高可用方案，galera mariadb就是集成了Galera插件的MariaDB集群，Galera本身是具有多主特性的，所以galera mariadb不是傳統的主備模式的集群，而是多主節點架構。

2 galera mariadb的配置方式

我的一篇OpenStack高可用模塊博客中其中有一段是描述搭建galera mariadb的（2.2.1數據庫服務高可用配置）：OpenStack高可用方案及配置

3 galera mariadb的一些基本概念
（1）當前節點數據庫狀態

MariaDB [(none)]> show status like ‘wsrep_local_state_comment‘；
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

狀態查詢表：

狀態	狀態說明
Open	節點啟動成功，嘗試連接到集群
Primary	節點已處於集群中，在新節點加入時，選取donor進行數據庫同步時會產生的狀態
Joiner	節點處於等待接收或正在接收同步文件的狀態
Joined	節點完成數據同步，但還有部分數據不是最新的，在追趕與集群數據一致的狀態
Synced	節點正常提供服務的狀態，表示當前節點數據狀態與集群數據狀態是一致的
Donor	表示該節點被選為Donor節點，正在為新加進來的節點進行全量數據同步，此時該節點對客戶端不提供服務

（2）Primary Component
在網絡發生故障時，由於網絡連接原因，集群可能被分成好幾個小集群，但只能有一個集群可以繼續進行數據修改，集群的這部分稱為Primary Component

（3）GTID
英文全稱為Global Transaction ID，由UUID和sequence number偏移量組成，wsrep api中定義的集群內部全局事務id，一個順序id，用來集群集群中狀態改變的唯一標誌及隊列中的偏移量

（4）SST
英文全稱為State Snapshot Transfer，即狀態快照遷移：通過從一個節點到另一個節點遷移完整的數據拷貝（全量拷貝）。當一個新的節點加入到集群中，新的節點從集群中已有節點進行數據同步，開始進行狀態快照遷移。
Galera中有兩種不同的狀態遷移方法：
<1>邏輯數據遷移：采用mysqldump命令，這是一個阻塞式的方法。
<2>物理數據遷移：該方法采用rsync、rsync_wan、xtrabackup等方法直接在服務器之間拷貝數據，接收的服務器在拷貝完數據後啟動服務。
可以通過配置文件中修改SST的方式：
wsrep_sst_method=rsync

（5）IST
英文全稱為Increamental State Transfer，即增量狀態遷移：集群一個節點通過識別新加入的節點缺失的事務操作，將該操作發送，而並不像SST那樣的全量數據拷貝。最常見情況就是該節點之前已經存在於該集群，只是關機重啟了，重新加入該集群會使用IST進行同步。

（6）grastate.dat
可以通過該文件查看到該節點記錄的uuid和seqno，也就是上面說的GTID，當節點正常退出Galera集群時，會將GTID的值更新到該文件中，如下：

[root@abc3 ~]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 30ae87da-8e8e-11e8-810c-6a8da854119b
seqno: 33557
safe_to_bootstrap: 0

如果該節點數據庫服務正在運行，則seqno的值是-1的

（7）gvwstate.dat
當節點形成或改變Primary Component時，節點會創建或更新該文件，確保節點保留最新Primary Component的狀態，如果節點正常關閉，該文件會被刪除。

4 一些故障場景的恢復
（1）場景1

技術分享圖片

其中1個節點掛了，一般只需要重啟A節點的服務即可

（2）場景2

技術分享圖片

所有節點都掛了，重啟服務時不能單純的全部重啟，需要找狀態最新的那個節點啟動，且啟動時需要加上--wsrep-new-cluster參數，該節點啟動後其它節點再正常啟動服務即可。
這裏就涉及到一個關鍵點，那就是怎麽找哪個是狀態最新的那個節點，第5點介紹查找最新節點的策略。

5 恢復策略和自動恢復腳本
（1）恢復策略
<1>首先判斷當前數據庫集群中是否有服務在啟動著，如果有則直接啟動服務即可
<2>如果當前所有節點的數據庫服務都掛了，則需要找狀態最新的那個節點讓它攜帶--wsrep-new-cluster參數啟動，啟動起來之後其它節點直接啟動服務即可。
查找最新節點策略：
首先獲取各節點的grastate.dat文件中的seqno值，值最大的那個就是最新的節點；如果所有節點的seqno都是-1，則去比較所有節點的gvwstate.dat文件中的my_uuid和view_id是否相等，相等的那個則作為第一個啟動節點，第一個啟動節點啟動後，其它節點正常啟動即可；如果依然未找到則需要人工幹預來恢復了。
以下是我自己寫的自動恢復腳本：

#!/usr/bin/python2
# -*- coding: utf-8 -*-

import os
import time
import traceback
import logging
import sys

# 初始化日誌對象
logger = logging.getLogger("check-or-recover-galera")
log_file=‘/var/log/check-or-recover-galera/check-or-recover-galera.log‘
if not os.path.exists(log_file):
    os.system(‘mkdir -p /var/log/check-or-recover-galera/‘)
    os.system(‘touch ‘ + log_file)
    
formatter = logging.Formatter(‘%(asctime)s (filename)s[line:%(lineno)d] %(levelname)s %(message)s‘)
file_handler = logging.FileHandler(log_file)
file_handler.setFormatter(formatter)

logger.addHandler(file_handler)
logger.setLevel(logging.DEBUG)

import socket

PORT = 10000
BUFF_SIZE = 10240

def test_connect_ok(ip):
    client_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client_sock.settimeout(3)
    client_sock.connect((ip, PORT))
    client_sock.close()

# 這個方法要求在要遠程的節點上需要有個進程在監聽PORT端口等待處理命令
def send_request(ip, data, timeout=60):
    test_connect_ok(ip)
    client_sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    client_sock.settimeout(timeout)
    client_sock.connect((ip, PORT))
    client_sock.send(data)
    ret_data = client_sock.recv(BUFF_SIZE)
    client_sock.close()
    return ret_data
    
def remote_send_request(ip, data, timeout=60):
    res_remote = send_request(ip, json.dumps(data), timeout=timeout)
    if res_remote is None or res_remote == ‘‘:
        raise Exception(‘res_remote is null‘)
    res_remote = json.loads(res_remote)
    if res_remote[‘ret_state‘] != ‘success‘:
        raise Exception(‘ret_state is not success‘)
    return res_remote
    
# 默認vmbr0是本地ip
def get_local_ip():
    cmd_out = os.popen(‘cat /etc/sysconfig/network-scripts/ifcfg-vmbr0 2>/dev/null |grep IPADDR‘).read()
    if cmd_out and cmd_out != ‘‘:
        cmd_out = cmd_out.strip()
        cmd_out = cmd_out.replace(‘"‘, ‘‘).replace(‘ ‘, ‘‘)
        tmp = cmd_out.split(‘=‘)
        if len(tmp) >= 2:
            ip = tmp[1]
            return ip
    return None
    
# 獲取各節點的seqno值
def get_all_nodes_seqno(node_ips_arr):
    seqno_dict = {}
    data = {‘req_type‘: ‘get_seqno‘}
    for node_ip in node_ips_arr:
        try:
            res_remote = remote_send_request(node_ip, data)
            seqno_dict[node_ip] = res_remote[‘seqno‘]
        except Exception,e:
            seqno_dict[node_ip] = -1
            logger.error(traceback.format_exc())
    return seqno_dict

# 獲取各節點的gvwstate.dat文件的my_uuid和view_id的比對值結果
def get_all_nodes_uv_is_equal(node_ips_arr):
    uv_equal_dict = {}
    data = {‘req_type‘: ‘get_uv_equal_value‘}
    for node_ip in node_ips_arr:
        try:
            res_remote = remote_send_request(node_ip, data)
            uv_equal_dict[node_ip] = res_remote[‘equal‘]
        except Exception,e:
            uv_equal_dict[node_ip] = 0
            logger.error(traceback.format_exc())
    return uv_equal_dict

# 檢查自身mariadb服務是否已經啟動
def check_is_active_now():
    is_active = os.popen(‘systemctl is-active mysqld_safe 2>/dev/null‘).read()
    is_active = is_active.strip()
    if is_active and is_active == ‘active‘:
        logger.info(‘the mariadb is already up‘)
        return True
    return False
    
# 第一個啟動的節點
def start_mariadb_with_wsrep():
    os.system("sed -i ‘s/--wsrep-new-cluster//‘ /usr/lib/systemd/system/mysqld_safe.service")
    os.system("sed -i ‘s/user=mysql/user=mysql --wsrep-new-cluster/‘ /usr/lib/systemd/system/mysqld_safe.service")
    os.system("sed -i ‘s/safe_to_bootstrap:.*/safe_to_bootstrap: 1/‘ /var/lib/mysql/grastate.dat")
    os.system(‘systemctl daemon-reload‘)
    os.system(‘systemctl start mysqld_safe‘)
    # 將配置文件恢復回去
    os.system("sed -i ‘s/--wsrep-new-cluster//‘ /usr/lib/systemd/system/mysqld_safe.service")
    os.system(‘systemctl daemon-reload‘)
    time.sleep(10)
    if check_is_active_now() is True:
        return True
    else:
        logger.error(‘use option wsrep-new-cluster start mariadb failed‘)
    return False
    
    
def main():
    while True:
        try:
            time.sleep(10)
            # 先檢測自己的mariadb是否已經自己啟動
            if check_is_active_now() is True:
                time.sleep(60)
                continue
            
            # 這裏應該先檢測下thintaskd服務是否已經啟動，如果還沒啟動則需等待
            is_thintaskd_active = os.popen(‘/etc/init.d/thintaskd status 2>/dev/null |grep active |grep running‘).read()
            if not is_thintaskd_active or is_thintaskd_active == ‘‘:
                logger.info(‘wait thintaskd service start‘)
                time.sleep(5)
            
            # 獲取當前galera的集群的各節點的ip
            node_ips_info = os.popen("cat /etc/my.cnf.d/mariadb-server.cnf |grep ‘^wsrep_cluster_address‘").read()
            node_ips_str = node_ips_info.split(‘gcomm://‘)[1]
            node_ips_str = node_ips_str.strip()
            node_ips_arr = node_ips_str.split(‘,‘)
            
            # 檢測其它節點是否已經有在運行著的
            data = {‘req_type‘: ‘check_mariadb_service‘}
            has_mariadb_service_on = False
            for node_ip in node_ips_arr:
                try:
                    res_remote = remote_send_request(node_ip, data)
                    state = res_remote[‘state‘]
                    if state == ‘active‘:
                        has_mariadb_service_on = True
                        # 找到在運行著的節點
                        logger.info(‘find the running mariadb service node:‘ + node_ip)
                        # 直接啟動自己服務
                        os.system(‘systemctl start mysqld_safe‘)
                        time.sleep(10)
                        if check_is_active_now() is True:
                            time.sleep(60)
                        else:
                            logger.info(‘start mariadb service error‘)
                        break
                except Exception,e:
                    logger.error(traceback.format_exc())
                    logger.error(‘check_mariadb_service for ‘ + node_ip + ‘ failed, error:‘ + e.message)
            if has_mariadb_service_on is True:
                continue
                    
            # 如果所有節點的mariadb都沒在運行，則需要尋找一個節點進行啟動
            seqno_dict = get_all_nodes_seqno(node_ips_arr)
            logger.info(‘get seqno_dict:%s‘, seqno_dict)
            # 根據seqno值判斷哪個節點為啟動節點
            first_boot_node = None
            max_seqno = -2
            for key in seqno_dict:
                if seqno_dict[key] > max_seqno:
                    max_seqno = seqno_dict[key]
                    first_boot_node = key
            if first_boot_node is not None:
                logger.info(‘find the first_boot_node by seqno, first_boot_node:‘ + first_boot_node)
                # 判斷這個啟動節點是不是自己，如果是就啟動，否則等待其它節點啟動起來
                if first_boot_node == get_local_ip():
                    if start_mariadb_with_wsrep() is True:
                        time.sleep(60)
                else:
                    logger.info(‘wait node ‘ + first_boot_node + ‘ start mariadb service‘)
                    time.sleep(5)
                continue
            else:
                logger.info("all node‘s seqno is -1")
                
            # 如果所有節點的seqno都是-1則說明可能是全部主機非正常停止的，比如斷電等
            # 這時則通過比對gvwstate.dat文件的my_uuid和view_id是否相等來決定從這個節點啟動
            # 當集群時幹凈狀態停止的時候該文件是被刪除的
            uv_equal_dict = get_all_nodes_uv_is_equal(node_ips_arr)
            # 根據返回的值判斷哪個是啟動節點，1表示是，0表示否
            for key in uv_equal_dict:
                if uv_equal_dict[key] == 1:
                    first_boot_node = key
                    logger.info(‘find the first_boot_node by uv_equal_dict, first_boot_node:‘ + first_boot_node)
                    break
            if first_boot_node is not None:
                # 判斷這個啟動節點是不是自己，如果是就啟動，否則等待其它節點啟動起來
                if first_boot_node == get_local_ip():
                    if start_mariadb_with_wsrep() is True:
                        time.sleep(60)
                    else:
                        logger.info(‘wait node ‘ + first_boot_node + ‘ start mariadb service‘)
                        time.sleep(5)
                continue
            else:
                logger.info("can not find first_boot_node by gvwstate.dat file")
                
            # 如果經過上述步驟依然找不到啟動節點，需要人工進行幹預了，或者可以隨機挑選個節點進行啟動
            logger.error(‘can not find first_boot_node, maybe you should ask admin to deal with this problem‘)
            time.sleep(5)
        except Exception,e:
            logger.error(traceback.format_exc())
            logger.error(‘error:‘ + e.message)
        
if __name__ == "__main__":
    sys.exit(main())

以下是自定義的mysqld_safe.service服務的文件，你可以將它放在/usr/lib/systemd/system/mysqld_safe.service

[Unit]
Description=Thinputer API Server
After=syslog.target network.target

[Service]
Type=notify
NotifyAccess=all
TimeoutStartSec=0
User=root

ExecStartPre=/usr/libexec/mysql-check-socket
ExecStartPre=/usr/libexec/mysql-prepare-db-dir %n
ExecStart=/bin/mysqld_safe --defaults-file=/etc/my.cnf.d/mariadb-server.cnf --user=mysql


[Install]
WantedBy=multi-user.target

galera mariadb集群恢復策略

new 模塊 file 創建 nodes code spl emd problem 1 galera mariadb首先MariaDB是一個數據庫，可以看成是MySQL的一個分支，由於MySQL被SUN收購，所以MySQL面臨著閉源的風險，當時MySQL之父Widenius

galera mariadb集群恢復策略

galera mariadb集群恢復策略

實戰Mariadb galera Cluster集群架構

記一次TokuMX數據庫集群恢復

mariadb集群與nginx負載均衡配置--centos7版本

elasticsearch(es) 集群恢復觸發配置（Local Gateway參數）

基於galera cluster集群實現mysql數據庫的高可用

027_【重要#集群恢復步驟】MySQL Group Replication Got fatal error 1236 - CrazyPig的技術博客 - CSDN博客

kubernetes集群恢復方法

實戰mariadb-galera集群架構

Linux下MySQL/MariaDB Galera集群搭建過程【轉】

Mariadb-Galera集群雜記

Mariadb配置Galera集群

Mysql/MariaDB的集群實現：Galera Cluster

【轉】集群/分布式環境下5種session處理策略

Spark調研筆記第3篇 - Spark集群相應用的調度策略簡單介紹

【架構師之路】集群/分布式環境下5種session處理策略

Storm集群上的開發，Topology任務的編寫之 WordCount Spout和Blot的分組策略（一張圖說明問題）（五）

MHA 故障庫恢復到集群 python腳本

Corosync+pacemaker+DRBD+mysql（mariadb）實現高可用（ha）的mysql集群（centos7）

Galera Cluster mysql+keepalived集群部署

galera mariadb集群恢復策略

相關推薦