1. 程式人生 > >[MariaDB]MHA高可用部署-實驗

[MariaDB]MHA高可用部署-實驗

目錄

  • 一、簡介
    • 1.1MHA角色
  • 二、MHA的工具
  • 三、MHA部署過程
    • 3.1.1 配置
    • 3.1.2 環境規劃
    • 3.1.3 配置一主多從
    • 3.2 MHA配置
    • 3.2.1 master許可權授予
    • 3.2.2 ssh互信
    • 3.2.3 安裝mha包
    • 3.2.4 MHA管理節點配置
    • 3.2.5 MHA節點檢測
    • 3.2.6 MHA啟動
    • 3.2.7 MHA模擬故障
    • 3.2.8 修復已損壞的節點加入MHA中
  • MHA 問題集錦

一、簡介

MHA的邏輯是,為了保證其MySQL的高可用,會有一個StandBy狀態的master.在mysql故障切換的過程中,MHA 能做到在 0~30 秒內自動完成資料庫的故障切換操作,並且在進行故障切換的過程中,MHA 能最大程度的保證資料的一致性,以達到相對意義上的高可用。

1.1MHA角色

如下圖,整個 MHA 架構分為

  • MHA Manager 節點
  • MHA Node 節點
    其中 MHA Manager 節點是單點部署,MHA Node 節點是部署在每個需要監控的 MySQL 叢集節點上的。MHA Manager 會定時探測叢集中的 Master 節點,當 Master 出現故障時,它可以自動將最新資料的 Standby Master 或 Slave 提升為新的 Master,然後將其他的 Slave 重新指向新的 Master。

二、MHA的工具

Manager節點:

  • masterha_check_ssh:MHA 依賴的 ssh 環境監測工具;
  • masterha_check_repl:MYSQL 複製環境檢測工具;
  • masterga_manager:MHA 服務主程式;
  • masterha_check_status:MHA 執行狀態探測工具;
  • masterha_master_monitor:MYSQL master 節點可用性監測工具;
  • masterha_master_swith:master:節點切換工具;
  • masterha_conf_host:新增或刪除配置的節點;
  • masterha_stop:關閉 MHA 服務的工具。

Node節點:(這些工具通常由MHA Manager的指令碼觸發,無需人為操作)

  • save_binary_logs:儲存和複製 master 的二進位制日誌;
  • apply_diff_relay_logs:識別差異的中繼日誌事件並應用於其他 slave;
  • purge_relay_logs:清除中繼日誌(不會阻塞 SQL 執行緒);

自定義擴充套件:

  • secondary_check_script:通過多條網路路由檢測master的可用性;
  • master_ip_failover_script:更新application使用的masterip;
  • report_script:傳送報告;
  • init_conf_load_script:載入初始配置引數;
  • master_ip_online_change_script;更新master節點ip地址。

三、MHA部署過程

3.1.1 配置

MHA 對 MYSQL 複製環境有特殊要求,例如各節點都要開啟二進位制日誌及中繼日誌,各從節點必須顯示啟用其read-only屬性,並關閉relay_log_purge功能等,這裡對配置做事先說明。

3.1.2 環境規劃

機器名 IP 角色 備註
manager 172.30.200.100 manager控制器 用於管理和故障切換
master 172.30.200.101 資料庫主伺服器 開啟binlog,relay-log。關閉relay_log_purge
slave1 172.30.200.102 資料庫從伺服器 開啟binlog,relay-log。關閉relay_log_purge
slave2 172.30.200.103 資料庫從伺服器 開啟binlog,relay-log。關閉relay_log_purge

在各個節點的/etc/hosts檔案配置內容新增如下:

[root@localhost ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.30.200.100  arpmgr
172.30.200.101  arpmaster
172.30.200.102  arpslave1
172.30.200.103  arpslave2

建立binlog的目錄

mkdir -p /data/mysqldata/binlog
chown -R mysql:mysql /data/mysqldata/binlog

101節點配置:

server-id = 200101

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14

lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

102節點配置:

server-id = 200102

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14

read_only = ON
relay_log_purge = 0

lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

103節點配置:


server-id = 200103

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
read_only = ON
relay_log_purge = 0
expire-logs-days = 14

lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

3.1.3 配置一主多從

master節點配置:

MariaDB [(none)]>grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';
MariaDB [(none)]> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |      548 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

slave節點配置:

grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';

change master to master_host='172.30.200.101', 
master_user='repl', 
master_password='repl7101',
master_log_file='mysql-bin.000001',
master_log_pos=548;

start slave;
show slave status\G;

至此,一主多從配置完畢。

3.2 MHA配置

3.2.1 master許可權授予

可以在所有節點上面配置,其擁有管理許可權,目前只需在master結點上設定許可權:

grant all on *.* to 'mhaadmin'@'172.30.%.%' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';

3.2.2 ssh互信

四個節點都執行如下語句:

ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@arpmgr

然後在arpmgr結點上面,可以看到authorized_keys檔案的資訊內容如下:

[root@localhost .ssh]# cat authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDY3yFhR5uzcFEpd+q+1Uw/cRF9ZRraygms7OZoefLFzY/ydSi6yYuCighG8WquvRep7XDNjFI71HAUagSoXiyPoCe1lqEnzpxSc+fQpIeQqEhUmLJ2bk+R83EskzwRGh+S/D4yp/swWz1vRgUGoTWevLCs33q7ZrsM8i+jB0uwZmzOV+CyQAPW9vLkRjZa4y1sx65lbR0HbdTQWQYZ4IyZauoU8XQjAIOs/CdLw2nBt8dPO53jT7NS7Ywx6eu/Wj9k/sYVVZT3jTb+pBIVs+Du5+tdUDX5aLKzxINpLlqNhorNevoC9iE0Ame1qvYonQfyWQ52Ae0y+58vFfG6PyV3 [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/ZPihYSC6ArawKRU75aQRVSFsQ5S89SrYHGWdzyluB4spj+UDUmWH1kLGYr715/HD5hh22KdLmIs7R4jviOeao1HK52fpMvklYaNtYRHuV63Zkg5sOLvLfhrHdta9wuHlW1NyWx75+wIl2LvKBRtnSddwf5ZvitJ/kChf2gpNhHAWidyjGsPoJdr0OBCNHvz1y6oON6cnMb07ExaIjptRnkbCOU0QSVjFq4+Jmh8zTTbJC2up50s15gSfWXH0+WLXmJXJGkvgHdSYqw4vJt/l25f5qAKKZsfnyfC0iyct4GyHPF6trpvQ/c2lqr/Rg4xLWgdxlyt4aBJYl5adIRK/ [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDba26wV0KwQNTb4pKuiFDCcVMNRLGMXSiJC8ucN4/KIqzoOYJ747QL8GL5F8ePnRaZ1rtOwdjnlTiC0a4Tcg4JLs+JSnJgzvepuixmGgSJfLbJ36iN1WFh6fP2GZEDdR7Qum4sBUpQyYJ20Kf9rKfQQv2wq6csK5IlFk/OoO+zTySauLnYvRxvKY2avVDXPPFJvpqimKXn59MIAoJr6YEKvncbYyqvrSgUy7klZDys9IIjYcWfO7VKjQ5bwbHrrKtNbedME+KPQld7e8ZVL66Omik4Z6ip7DQEHRKWMmuBIpL99AgOOjPLbzJFWLUPOwvy3DtmEBnZ+0NVf/1obC11 [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwrgZtGC31EgixeY4SVl4h64m1r8LdL3hM4Be2/I+6Xw7hCzZyKKTAFgz9W/ukfx6WmZwoqp1VO/7Jp6KO1FhYOi5u0q6J1KIObFNp+3E6cB2P0q39WqmZpQ9cNPYrbs9U2Ej0L0JwUtf/xLh334PaSlv/LcNy+p1dWya2OqsBeraiXZ4MgEBzcb+0twkpfpD327VgT/mRHPmA6fPRJOOJti1u4isHeotE4i13YIqQYfBfmbfiLdXKAvgI8FuTf0i91Re/FUBOgBfBcJbqIQNR0Nh5wZ/LvNxkstDQvypZIZwiK+wN+aZZOQ7jF/+997Z9QQleC9OOoHOJR7+fisLb [email protected]

正好有四個形如ssh-rsa相關 的金鑰資訊。

把如上的公鑰資訊,拷貝到其餘四臺伺服器上面:

scp authorized_keys root@arpmaster:~/.ssh/
scp authorized_keys root@arpslave1:~/.ssh/
scp authorized_keys root@arpslave2:~/.ssh/

測試ssh是否可用

[root@localhost .ssh]# ssh arpmaster
[root@localhost ~]# ssh arpslave1
[root@localhost ~]# ssh arpslave2
[root@localhost ~]# ssh arpmgr

3.2.3 安裝mha包

mha安裝包分為兩個,一個是node,另外一個是manager

四個節點安裝:mha4mysql-node-0.57-0.el7.centos.noarch.rpm

管理節點安裝:mha4mysql-manager-0.57-0.el7.centos.noarch.rpm

在安裝`mha4mysql-node-0.57-0.el7.centos.noarch.rpm過程中,有對perl-DBD-mysql,perl-DBI前置依賴,安裝步驟如下:

yum install perl-DBD-mysql perl-DBI

在安裝`mha4mysql-manager-0.57-0.el7.centos.noarch.rpm過程中,有對perl前置依賴,安裝步驟如下:

安裝yum 擴充套件包
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-Config-Tiny perl-Log-Dispatch-* perl-Parallel-ForkManager

然後安裝資訊,都成功,如下:

[root@localhost ~]# rpm -ivh mha4mysql-node-0.57-0.el7.centos.noarch.rpm 
準備中...                          ################################# [100%]
正在升級/安裝...
   1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
[root@localhost ~]# rpm -ivh mha4mysql-manager-0.57-0.el7.centos.noarch.rpm 
準備中...                          ################################# [100%]
正在升級/安裝...
   1:mha4mysql-manager-0.58-0.el7.cent################################# [100%]

0.58中有一個super_read_only不可用在mariadb,所以使用0.57版本。

3.2.4 MHA管理節點配置

[root@localhost ~]# cd /etc/mha_master/
[root@localhost mha_master]# vi /etc/mha_master/mha.cnf

配置檔案內容如下:

[server default]
user=mhaadmin
password=mha7101
manager_workdir=/etc/mha_master/app1
manager_log=/etc/mha_master/manager.log
remote_workdir=/data/mha_master/app1
repl_user=repl
repl_password=repl7101
ping_interval=1
[server1]
hostname=172.30.200.101
ssh_port=22
[server2]
hostname=172.30.200.102
ssh_port=22
candidate_master=1
[server3]
hostname=172.30.200.103
ssh_port=22
no_master=1

3.2.5 MHA節點檢測

  1. 在管理節點檢測ssh連通性如下:

    [root@localhost ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
    

    有如下日誌,代表正常:

    Thu Jan  9 14:43:09 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Thu Jan  9 14:43:09 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
    [email protected](172.30.200.103:22) to [email protected](172.30.200.101:22)..
    Thu Jan  9 14:43:11 2020 - [debug]   ok.
    Thu Jan  9 14:43:11 2020 - [debug]  Connecting via SSH from [email protected](172.30.200.103:22) to [email protected](172.30.200.102:22)..
    Thu Jan  9 14:43:11 2020 - [debug]   ok.
    Thu Jan  9 14:43:12 2020 - [info] All SSH connection tests passed successfully.
  2. 檢測MySQL replication是否正常

    masterha_check_repl --conf=/etc/mha_master/mha.cnf

    有如下日誌,說明正常:

    Thu Jan  9 14:44:54 2020 - [info] Slaves settings check done.
    Thu Jan  9 14:44:54 2020 - [info] 
    172.30.200.101(172.30.200.101:3306) (current master)
     +--172.30.200.102(172.30.200.102:3306)
     +--172.30.200.103(172.30.200.103:3306)
    
    Thu Jan  9 14:44:54 2020 - [info] Checking replication health on 172.30.200.102..
    Thu Jan  9 14:44:54 2020 - [info]  ok.
    Thu Jan  9 14:44:54 2020 - [info] Checking replication health on 172.30.200.103..
    Thu Jan  9 14:44:54 2020 - [info]  ok.
    Thu Jan  9 14:44:54 2020 - [warning] master_ip_failover_script is not defined.
    Thu Jan  9 14:44:54 2020 - [warning] shutdown_script is not defined.
    Thu Jan  9 14:44:54 2020 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.

3.2.6 MHA啟動

  1. 啟動mha manager:

    nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
  2. 檢測master節點狀態:

    [root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
    mha (pid:31709) is running(0:PING_OK), master:172.30.200.101

    說明主資料庫172.30.200.101啟動正常。

  3. 關閉mha manager:

    masterha_stop -conf=/etc/mha_master/mha.cnf

3.2.7 MHA模擬故障

  1. master直接kill mysql節點

    [root@localhost ~]# ps -ef |grep mysql
    root     19864     1  0 08:51 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysqldata --pid-file=/data/mysqldata/localhost.localdomain.pid
    mysql    19976 19864  0 08:51 ?        00:00:13 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysqldata --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysqldata/mysqld.log --pid-file=/data/mysqldata/localhost.localdomain.pid --socket=/tmp/mysql.sock
    root     22166 21525  0 14:55 pts/0    00:00:00 grep --color=auto mysql
    [root@localhost ~]# kill -9 19864 19976
  2. MHA轉移日誌。

    [root@localhost ~]# tail -f /etc/mha_master/manager.log
    
    From:
    172.30.200.101(172.30.200.101:3306) (current master)
     +--172.30.200.102(172.30.200.102:3306)
     +--172.30.200.103(172.30.200.103:3306)
    
    To:
    172.30.200.102(172.30.200.102:3306) (new master)
     +--172.30.200.103(172.30.200.103:3306)
    
    
    Master 172.30.200.101(172.30.200.101:3306) is down!
    
    Check MHA Manager logs at localhost.localdomain:/etc/mha_master/manager.log for details.
    
    Started automated(non-interactive) failover.
    The latest slave 172.30.200.102(172.30.200.102:3306) has all relay logs for recovery.
    Selected 172.30.200.102(172.30.200.102:3306) as a new master.
    172.30.200.102(172.30.200.102:3306): OK: Applying all logs succeeded.
    172.30.200.103(172.30.200.103:3306): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    172.30.200.103(172.30.200.103:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.30.200.102(172.30.200.102:3306)
    172.30.200.102(172.30.200.102:3306): Resetting slave info succeeded.
    Master failover to 172.30.200.102(172.30.200.102:3306) completed successfully.

    從上述日誌來看,172.30.200.102已經成為了新的master,而172.30.200.103還是slave資料庫。

3.2.8 修復已損壞的節點加入MHA中

由於這裡是實驗環境,可以不到處mysqldump的備份。如果是生產環境恢復,可以停掉slave的SQL thread,記住對應的pos的位置,然後備份出資料,保證資料一致性之後,同步資料,恢復損壞的結點。

change master to master_host='172.30.200.102', 
master_user='repl', 
master_password='repl7101',
master_log_file='mysql-bin.000003',
master_log_pos=401;

檢視slave狀態:

MariaDB [(none)]> start slave;

MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.30.200.102
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000003
           Read_Master_Log_Pos: 401
                Relay_Log_File: relay-bin.000002
                 Relay_Log_Pos: 555
         Relay_Master_Log_File: mysql-bin.000003
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

再次啟動,如下:

[root@localhost ~]# nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
[root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
mha (pid:31905) is running(0:PING_OK), master:172.30.200.101

至此,MHA實驗完畢。由於生產環境會用到VIP,後續會繼續編寫。

MHA 問題集錦

錯誤一

日誌錯誤:

Thu Jan  9 11:31:36 2020 - [info]   Connecting to [email protected](172.30.200.102:22).. 
Can't exec "mysqlbinlog": 沒有那個檔案或目錄 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options

解決方法:

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

錯誤二

日誌錯誤:

Checking if super_read_only is defined and turned on..DBI connect(';host=172.30.200.102;port=3306','mhaadmin',...) failed: Access denied for user 'mhaadmin'@'arpslave1' (using password: YES) at /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm line 239

解決方法:

manager節點,執行:

grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmaster' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave1' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave2' identified by 'mha7101';

錯誤三

日誌如下:

    Testing mysql connection and privileges..sh: mysql: 未找到命令
mysql command failed with rc 127:0!

解決方法:

ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql