1. 程式人生 > >MySQL高可用架構之基於MHA的搭建

MySQL高可用架構之基於MHA的搭建

MySQL高可用架構之基於MHA的搭建

 一、MySQL MHA架構介紹:

MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,它由日本DeNA公司youshimaton(現就職於Facebook公司)開發,是一套優秀的作為MySQL高可用性環境下故障切換和主從提升的高可用軟體。在MySQL故障切換過程中,MHA能做到在0~30秒之內自動完成資料庫的故障切換操作,並且在進行故障切換的過程中,MHA能在最大程度上保證資料的一致性,以達到真正意義上的高可用。

該軟體由兩部分組成:MHA Manager(管理節點)和MHA Node(資料節點)

。MHA Manager可以單獨部署在一臺獨立的機器上管理多個master-slave叢集,也可以部署在一臺slave節點上。MHA Node執行在每臺MySQL伺服器上,MHA Manager會定時探測叢集中的master節點,當master出現故障時,它可以自動將最新資料的slave提升為新的master,然後將所有其他的slave重新指向新的master。整個故障轉移過程對應用程式完全透明。

在MHA自動故障切換過程中,MHA試圖從宕機的主伺服器上儲存二進位制日誌,最大程度的保證資料的不丟失,但這並不總是可行的。例如,如果主伺服器硬體故障或無法通過ssh訪問,MHA沒法儲存二進位制日誌,只進行故障轉移而丟失了最新的資料。使用MySQL 5.5的半同步複製,可以大大降低資料丟失的風險。MHA可以與半同步複製結合起來。如果只有一個slave已經收到了最新的二進位制日誌,MHA可以將最新的二進位制日誌應用於其他所有的slave伺服器上,因此可以保證所有節點的資料一致性。

目前MHA主要支援一主多從的架構,要搭建MHA,要求一個複製叢集中必須最少有三臺資料庫伺服器,一主二從,即一臺充當master,一臺充當備用master,另外一臺充當從庫,因為至少需要三臺伺服器,出於機器成本的考慮,淘寶也在該基礎上進行了改造,目前淘寶TMHA已經支援一主一從。(出自:《深入淺出MySQL(第二版)》)

官方介紹:https://code.google.com/p/mysql-master-ha/

下圖展示瞭如何通過MHA Manager管理多組主從複製。

可以將MHA工作原理總結為如下:

(1)從宕機崩潰的master儲存二進位制日誌事件(binlog events);
(2)識別含有最新更新的slave;
(3)應用差異的中繼日誌(relay log)到其他的slave;
(4)應用從master儲存的二進位制日誌事件(binlog events);
(5)提升一個slave為新的master;
(6)使其他的slave連線新的master進行復制;

MHA軟體由兩部分組成,Manager工具包和Node工具包,具體的說明如下。

Manager工具包主要包括以下幾個工具:

複製程式碼

masterha_check_ssh              檢查MHA的SSH配置狀況
masterha_check_repl             檢查MySQL複製狀況
masterha_manger                 啟動MHA
masterha_check_status           檢測當前MHA執行狀態
masterha_master_monitor         檢測master是否宕機
masterha_master_switch          控制故障轉移(自動或者手動)
masterha_conf_host              新增或刪除配置的server資訊

複製程式碼

Node工具包(這些工具通常由MHA Manager的指令碼觸發,無需人為操作)主要包括以下幾個工具:

save_binary_logs                儲存和複製master的二進位制日誌
apply_diff_relay_logs           識別差異的中繼日誌事件並將其差異的事件應用於其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用這個工具)
purge_relay_logs                清除中繼日誌(不會阻塞SQL執行緒)

注意:為了儘可能的減少主庫硬體損壞宕機造成的資料丟失,因此在配置MHA的同時建議配置成MySQL 5.5的半同步複製。關於半同步複製原理各位自己進行查閱。(不是必須)

二、環境準備

時間同步(同步後確認各伺服器時間是否一致,不一致需要修改一下時區)

關閉防火牆

安裝MySQL資料庫(實驗環境為MySQL5.6)

軟體包連結:https://pan.baidu.com/s/1o934VZc

三、互相配置ssh免密碼登入(注意是互相,並且最好不要禁掉密碼登入,如果禁了,可能會有問題)

在master-db1 192.168.1.11上操作:

[[email protected] ~]# echo -e "\n" |ssh-keygen  -t dsa -N "" 
[[email protected] ~]# ssh-copy-id -i .ssh/id_dsa.pub  [email protected]
[[email protected] ~]# ssh-copy-id -i .ssh/id_dsa.pub  [email protected].13
[[email protected] ~]# ssh-copy-id -i .ssh/id_dsa.pub  [email protected]

另外三臺按照上面方法配置即可

四、搭建MySQL主從複製環境

注意:binlog-do-db 和 replicate-ignore-db 設定必須相同。 MHA 在啟動時候會檢測過濾規則,如果過濾規則不同,MHA 不啟動監控和故障轉移。

1.備份主庫資料

[[email protected] ~]# mysqldump  --master-data=2 --single-transaction -R --triggers -A > all.sql

2.在Master 192.168.1.11和Candicate master 192.168.1.12上建立複製使用者(slave如果配置為no-master可以不建立,否則也應當建立複製使用者):

mysql> grant replication slave on *.* to 'repl'@'192.168.1.%' identified by '123456'; 
mysql> flush privileges;

3.檢視主庫備份時的binlog名稱和位置:

複製程式碼

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |      407 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

複製程式碼

4.把備份複製到192.168.1.12和192.168.1.13

[[email protected] ~]# scp all.sql 192.168.1.12:/root                                               
[[email protected] ~]# scp all.sql 192.168.1.13:/root

5.分別在兩臺伺服器上匯入備份

[[email protected] ~]# mysql < all.sql 
[[email protected] ~]# mysql < all.sql 

6.分別在兩臺伺服器上執行復制相關命令

複製程式碼

mysql> CHANGE MASTER TO MASTER_HOST='192.168.1.11',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000002',MASTER_LOG_POS=407;
Query OK, 0 rows affected, 2 warnings (0.12 sec)

mysql> start slave;
Query OK, 0 rows affected (0.08 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.11
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 407
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

複製程式碼

7.建立mha管理的賬號,在所有mysql伺服器上都需要執行:

mysql> grant all privileges on *.* to 'root'@'192.168.1.%' identified  by '123456';
mysql> flush privileges;
如果是在slave伺服器上安裝的manager,則需要建立以本機hostname名連線的賬號,不然masterha_check_repl測試通不過。
GRANT ALL PRIVILEGES ON *.* TO 'root'@'master(主機名)' IDENTIFIED BY '123456' 

五、安裝MHA

1.安裝MHA的Perl依賴包

在所有的mysql(192.168.1.11-13)上安裝

[[email protected] ~]# yum install perl-DBD-MySQL -y
[[email protected] ~]# yum install perl-DBD-MySQL -y
[[email protected] ~]# yum install perl-DBD-MySQL -y

在mha-monitor(192.168.1.14)上安裝MHA Manger依賴的perl模組

[[email protected] ~]# yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

2.在所有的伺服器(192.168.1.11-14)上安裝MHA Node軟體包

複製程式碼

[[email protected] ~]# tar xf mha4mysql-node-0.56.tar.gz 
[[email protected] ~]# cd mha4mysql-node-0.56
[[email protected] mha4mysql-node-0.56]# perl Makefile.PL 
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node
[[email protected] mha4mysql-node-0.56]# make &&make install

複製程式碼

3.在mha-monitor(192.168.1.14)上安裝MHA Manager軟體包

複製程式碼

[[email protected] ~]# tar xf mha4mysql-manager-0.56.tar.gz 
[[email protected] ~]# cd mha4mysql-manager-0.56
[[email protected] mha4mysql-manager-0.56]# perl Makefile.PL 
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.609)
- DBD::mysql            ...loaded. (4.013)
- Time::HiRes           ...loaded. (1.9721)
- Config::Tiny          ...loaded. (2.12)
- Log::Dispatch         ...loaded. (2.26)
- Parallel::ForkManager ...loaded. (0.7.9)
- MHA::NodeConst        ...missing.
==> Auto-install the 1 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Warning: prerequisite MHA::NodeConst 0 not found.
Writing Makefile for mha4mysql::manager
[[email protected] mha4mysql-manager-0.56]# make &&make install

複製程式碼

安裝完成後會在/usr/local/bin目錄下面生成以下指令碼檔案,前面已經說過這些指令碼的作用,這裡不再重複

複製程式碼

[[email protected] mha4mysql-manager-0.56]# ll /usr/local/bin
總用量 124
-r-xr-xr-x  1 root root 16367 1月  17 22:28 apply_diff_relay_logs
-r-xr-xr-x  1 root root  4807 1月  17 22:28 filter_mysqlbinlog
-r-xr-xr-x  1 root root  1995 1月  17 22:29 masterha_check_repl
-r-xr-xr-x  1 root root  1779 1月  17 22:29 masterha_check_ssh
-r-xr-xr-x  1 root root  1865 1月  17 22:29 masterha_check_status
-r-xr-xr-x  1 root root  3201 1月  17 22:29 masterha_conf_host
-r-xr-xr-x  1 root root  2517 1月  17 22:29 masterha_manager
-r-xr-xr-x  1 root root  2165 1月  17 22:29 masterha_master_monitor
-r-xr-xr-x  1 root root  2373 1月  17 22:29 masterha_master_switch
-r-xr-xr-x  1 root root  5171 1月  17 22:29 masterha_secondary_check
-r-xr-xr-x  1 root root  1739 1月  17 22:29 masterha_stop
-r-xr-xr-x  1 root root  8261 1月  17 22:28 purge_relay_logs
-r-xr-xr-x  1 root root  7525 1月  17 22:28 save_binary_logs

複製程式碼

在/root/mha4mysql-manager-0.56/samples/scripts/下有些示例指令碼複製到/usr/local/bin/下,這些指令碼不完整,需要自己修改,這是軟體開發著留給我們自己發揮的,如果開啟下面的任何一個指令碼對應的引數,而對應這裡的指令碼又沒有修改,則會拋錯

複製程式碼

[[email protected] mha4mysql-manager-0.56]# ll /root/mha4mysql-manager-0.56/samples/scripts/
總用量 32
-rwxr-xr-x 1 4984 users  3648 4月   1 2014 master_ip_failover       #自動切換時vip管理的指令碼,不是必須,如果我們使用keepalived的,我們可以自己編寫指令碼完成對vip的管理,比如監控mysql,如果mysql異常,我們停止keepalived就行,這樣vip就會自動漂移
-rwxr-xr-x 1 4984 users  9870 4月   1 2014 master_ip_online_change  #線上切換時vip的管理,不是必須,同樣可以可以自行編寫簡單的shell完成
-rwxr-xr-x 1 4984 users 11867 4月   1 2014 power_manager            #故障發生後關閉主機的指令碼,不是必須
-rwxr-xr-x 1 4984 users  1360 4月   1 2014 send_report              #因故障切換後傳送報警的指令碼,不是必須,可自行編寫簡單的shell完成。
[[email protected] scripts]# cp /root/mha4mysql-manager-0.56/samples/scripts/*  /usr/local/bin/

複製程式碼

六、配置MHA

1.建立MHA的工作目錄,並且建立相關配置檔案(在軟體包解壓後的目錄裡面有樣例配置檔案)。

[[email protected] ~]# mkdir -p /etc/masterha
[[email protected] ~]# cp /root/mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/
[[email protected] ~]# ll /etc/masterha/
總用量 4
-rw-r--r-- 1 root root 257 1月  17 22:40 app1.cnf

2.修改app1.cnf配置檔案,修改後的檔案內容如下

複製程式碼

[server default]
manager_log=/var/log/masterha/app1/manager.log   //設定manager的日誌
manager_workdir=/var/log/masterha/app1   //設定manager的工作目錄
master_binlog_dir=/Data/apps/mysql-5.6.36/data/  //設定master 儲存binlog的位置,以便MHA可以找到master的日誌,我這裡的也就是mysql的資料目錄
master_ip_failover_script=/usr/local/bin/master_ip_failover  //設定自動failover時候的切換指令碼
master_ip_online_change_script=/usr/local/bin/master_ip_online_change //設定手動切換時候的切換指令碼
password=123456//設定mysql中root使用者的密碼,這個密碼是前文中建立監控使用者的那個密碼
user=root //設定監控使用者root
ping_interval=1  //設定監控主庫,傳送ping包的時間間隔,預設是3秒,嘗試三次沒有迴應的時候自動進行railover
remote_workdir=/tmp//設定遠端mysql在發生切換時binlog的儲存位置
repl_password=123456  //設定複製使用者的密碼
repl_user=repl//設定複製使用者
report_script=/usr/local/send_report //設定發生切換後傳送的報警的指令碼
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.1.11 -s 192.168.1.12 #實現多路由監測Master的可用性
shutdown_script="" //設定故障發生後關閉故障主機指令碼(該指令碼的主要作用是關閉主機放在發生腦裂,這裡沒有使用)
ssh_user=root  //設定ssh的登入使用者名稱


[server1]
hostname=192.168.1.11
port=3306

[server2]
candidate_master=1 //設定為候選master,如果設定該引數以後,發生主從切換以後將會將此從庫提升為主庫,即使這個主庫不是叢集中事件最新的slave
check_repl_delay=0  //預設情況下如果一個slave落後master 100M的relay logs的話,MHA將不會選擇該slave作為一個新的master,因為對於這個slave的恢復需要花費很長時間,通過設定check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略複製延時,這個引數對於設定了candidate_master=1的主機非常有用,因為這個候選主在切換的過程中一定是新的master

hostname=192.168.1.12
port=3306


[server3]
hostname=192.168.1.13
port=3306
no_master=1

複製程式碼

3.設定relay log的清除方式(在每個slave節點上):

[[email protected] ~]# mysql -e 'set global relay_log_purge=0'
[[email protected] ~]# mysql -e 'set global relay_log_purge=0'

注意:

MHA在發生切換的過程中,從庫的恢復過程中依賴於relay log的相關資訊,所以這裡要將relay log的自動清除設定為OFF,採用手動清除relay log的方式。在預設情況下,從伺服器上的中繼日誌會在SQL執行緒執行完畢後被自動刪除。但是在MHA環境中,這些中繼日誌在恢復其他從伺服器時可能會被用到,因此需要禁用中繼日誌的自動刪除功能。定期清除中繼日誌需要考慮到複製延時的問題。在ext3的檔案系統下,刪除大的檔案需要一定的時間,會導致嚴重的複製延時。為了避免複製延時,需要暫時為中繼日誌建立硬連結,因為在linux系統中通過硬連結刪除大檔案速度會很快。(在mysql資料庫中,刪除大表時,通常也採用建立硬連結的方式)

MHA節點中包含了pure_relay_logs命令工具,它可以為中繼日誌建立硬連結,執行SET GLOBAL relay_log_purge=1,等待幾秒鐘以便SQL執行緒切換到新的中繼日誌,再執行SET GLOBAL relay_log_purge=0。

pure_relay_logs指令碼引數如下所示:

複製程式碼

--user mysql                      使用者名稱
--password mysql                  密碼
--port                            埠號
--workdir                         指定建立relay log的硬連結的位置,預設是/var/tmp,由於系統不同分割槽建立硬連結檔案會失敗,故需要執行硬連結具體位置,成功執行指令碼後,硬連結的中繼日誌檔案被刪除
--disable_relay_log_purge         預設情況下,如果relay_log_purge=1,指令碼會什麼都不清理,自動退出,通過設定這個引數,當relay_log_purge=1的情況下會將relay_log_purge設定為0。清理relay log之後,最後將引數設定為OFF。

複製程式碼

設定定期清理relay指令碼(兩臺slave伺服器)

複製程式碼

[[email protected] ~]# cat purge_relay_log.sh 
#!/bin/bash
user=root
passwd=123456
port=3306
log_dir='/data/masterha/log'
work_dir='/Data/apps'
purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]
then
   mkdir $log_dir -p
fi

$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

複製程式碼

新增執行許可權,並新增到crontab定期執行,另外一臺相同操作

[[email protected] ~]#chmod +x purge_relay_log.sh
[[email protected] ~]#crontab -l
0 4 * * * /bin/bash /root/purge_relay_log.sh

purge_relay_logs指令碼刪除中繼日誌不會阻塞SQL執行緒。下面我們手動執行看看什麼情況。

複製程式碼

[[email protected] ~]#  purge_relay_logs --user=root --password=123456 --port=3306  --host=192.168.1.12 -disable_relay_log_purge --workdir=/Data/apps/
2018-01-17 23:07:59: purge_relay_logs script started.
 Found relay_log.info: /Data/apps/mysql-5.6.36/data/relay-log.info
 Opening /Data/apps/mysql-5.6.36/data/relay-log.000001 ..
 Opening /Data/apps/mysql-5.6.36/data/relay-log.000002 ..
 Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2018-01-17 23:08:02: All relay log purging operations succeeded.

複製程式碼

4.由於自帶的指令碼master_ip_failover有些問題需要自行修改,修改內容如下:

按 Ctrl+C 複製程式碼

 

按 Ctrl+C 複製程式碼

5.檢查SSH配置

複製程式碼

[[email protected] ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Wed Jan 17 23:13:30 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jan 17 23:13:30 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jan 17 23:13:30 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jan 17 23:13:30 2018 - [info] Starting SSH connection tests..
Wed Jan 17 23:13:33 2018 - [debug] 
Wed Jan 17 23:13:30 2018 - [debug]  Connecting via SSH from [email protected](192.168.1.11:22) to [email protected](192.168.1.12:22)..
Wed Jan 17 23:13:32 2018 - [debug]   ok.
Wed Jan 17 23:13:32 2018 - [debug]  Connecting via SSH from [email protected](192.168.1.11:22) to [email protected](192.168.1.13:22)..
Wed Jan 17 23:13:33 2018 - [debug]   ok.
Wed Jan 17 23:13:33 2018 - [debug] 
Wed Jan 17 23:13:31 2018 - [debug]  Connecting via SSH from [email protected](192.168.1.12:22) to [email protected](192.168.1.11:22)..
Wed Jan 17 23:13:32 2018 - [debug]   ok.
Wed Jan 17 23:13:32 2018 - [debug]  Connecting via SSH from [email protected](192.168.1.12:22) to [email protected](192.168.1.13:22)..
Wed Jan 17 23:13:33 2018 - [debug]   ok.
Wed Jan 17 23:13:33 2018 - [debug] 
Wed Jan 17 23:13:31 2018 - [debug]  Connecting via SSH from [email protected](192.168.1.13:22) to [email protected](192.168.1.11:22)..
Wed Jan 17 23:13:33 2018 - [debug]   ok.
Wed Jan 17 23:13:33 2018 - [debug]  Connecting via SSH from [email protected](192.168.1.13:22) to [email protected](192.168.1.12:22)..
Wed Jan 17 23:13:33 2018 - [debug]   ok.
Wed Jan 17 23:13:33 2018 - [info] All SSH connection tests passed successfully.

複製程式碼

可以看見各個節點ssh驗證都是ok的。

6.檢查整個複製環境狀況。

複製程式碼

[[email protected] ~]#  masterha_check_repl --conf=/etc/masterha/app1.cnf
.....
Checking the Status of the script.. OK 
Wed Jan 17 23:18:04 2018 - [info]  OK.
Wed Jan 17 23:18:04 2018 - [warning] shutdown_script is not defined.
Wed Jan 17 23:18:04 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

複製程式碼

7.開啟MHA Manager監控

複製程式碼

[[email protected] ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2>&1 &
[1] 7191
[[email protected] ~]# nohup: 忽略輸入並把輸出追加到"nohup.out"
[[email protected] ~]# jobs 
[1]+  Running                 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2>&1 &

複製程式碼

啟動引數介紹:

--remove_dead_master_conf       該引數代表當發生主從切換後,老的主庫的ip將會從配置檔案中移除。
--manger_log                    日誌存放位置
--ignore_last_failover          在預設情況下,如果MHA檢測到連續發生宕機,且兩次宕機間隔不足8小時的話,則不會進行Failover,之所以這樣限制是為了避免ping-pong效應。該引數代表忽略上次MHA觸發切換產生的檔案,預設情況下,MHA發生切換後會在日誌目錄,也就是上面我設定的/data產生app1.failover.complete檔案,下次再次切換的時候如果發現該目錄下存在該檔案將不允許觸發切換,除非在第一次切換後收到刪除該檔案,為了方便,這裡設定為--ignore_last_failover。

8.檢視MHA Manager監控狀態:

[[email protected] ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:7191) is running(0:PING_OK), master:192.168.1.11

可以看見已經在監控了,而且master的主機為192.168.1.11

七、故障測試

1.模擬MySQL故障,檢視VIP漂移和MySQL自動切換情況

注:切換後MHA服務會自動停止,官方給出的原因是

Running MHA Manager from daemontools
Currently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was killed by accident, 
the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used.
 Here is an example to run from daemontools.

master上停止mysql伺服器

[[email protected] ~]# service mysqld stop
Shutting down MySQL.....                                   [確定]  

在manager上檢視MHA服務和切換日誌

複製程式碼

[[email protected] ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
[1]+  Done                    nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2>&1

[[email protected] ~]# tail -20 /var/log/masterha/app1/manager.log

----- Failover Report -----

app1: MySQL Master failover 192.168.1.11(192.168.1.11:3306) to 192.168.1.12(192.168.1.12:3306) succeeded

Master 192.168.1.11(192.168.1.11:3306) is down!

Check MHA Manager logs at mha-monitor:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.1.11(192.168.1.11:3306)
The latest slave 192.168.1.12(192.168.1.12:3306) has all relay logs for recovery.
Selected 192.168.1.12(192.168.1.12:3306) as a new master.
192.168.1.12(192.168.1.12:3306): OK: Applying all logs succeeded.
192.168.1.12(192.168.1.12:3306): OK: Activated master IP address.
192.168.1.13(192.168.1.13:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.13(192.168.1.13:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.1.12(192.168.1.12:3306)
192.168.1.12(192.168.1.12:3306): Resetting slave info succeeded.
Master failover to 192.168.1.12(192.168.1.12:3306) completed successfully.

複製程式碼

看到最後的Master failover to 192.168.1.12(192.168.1.12:3306) completed successfully.說明備選master現在已經上位了。

從上面的輸出可以看出整個MHA的切換過程,共包括以下的步驟:

複製程式碼

1.配置檔案檢查階段,這個階段會檢查整個叢集配置檔案配置
2.宕機的master處理,這個階段包括虛擬ip摘除操作,主機關機操作(這個我這裡還沒有實現,需要研究)
3.複製dead maste和最新slave相差的relay log,並儲存到MHA Manger具體的目錄下
4.識別含有最新更新的slave
5.應用從master儲存的二進位制日誌事件(binlog events)
6.提升一個slave為新的master進行復制
7.使其他的slave連線新的master進行復制

複製程式碼

在slave-db2上檢視主從複製情況(192.168.1.13)

複製程式碼

mysql> show slave status\G;ges;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 635947
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

複製程式碼

啟動MHA Manger監控,檢視叢集裡面現在誰是master

2.將MySQL故障伺服器重新加入MHA環境步驟

1.把故障伺服器設為新的slave
2.重新啟動MHA manager
3.檢視MHA狀態

3線上手動切換主從

 在許多情況下, 需要將現有的主伺服器遷移到另外一臺伺服器上。 比如主伺服器硬體故障,RAID 控制卡需要重建,將主伺服器移到效能更好的伺服器上等等。維護主伺服器引起效能下降, 導致停機時間至少無法寫入資料。 另外, 阻塞或殺掉當前執行的會話會導致主主之間資料不一致的問題發生。 MHA 提供快速切換和優雅的阻塞寫入,這個切換過程只需要 0.5-2s 的時間,這段時間內資料是無法寫入的。在很多情況下,0.5-2s 的阻塞寫入是可以接受的。因此切換主伺服器不需要計劃分配維護時間視窗。

MHA線上切換的大概過程:

1.檢測複製設定和確定當前主伺服器
2.確定新的主伺服器
3.阻塞寫入到當前主伺服器
4.等待所有從伺服器趕上覆制
5.授予寫入到新的主伺服器
6.重新設定從伺服器 

注意,線上切換的時候應用架構需要考慮以下兩個問題

1.自動識別master和slave的問題(master的機器可能會切換),如果採用了vip的方式,基本可以解決這個問題。
2.負載均衡的問題(可以定義大概的讀寫比例,每臺機器可承擔的負載比例,當有機器離開叢集時,需要考慮這個問題)

線上切換步驟如下:

複製程式碼

1.原master出現故障
masterha_stop --conf=/etc/masterha/app1.cnf #停止
masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.11 --dead_master_port=3306 --new_master_host=192.168.1.12 --new_master_port=3306 --ignore_last_failover

2.把原master變為slave切換
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.12 --new_master_port=3306 --orig_master_is_new_slave

複製程式碼

注意:由於線上進行切換需要呼叫到master_ip_online_change這個指令碼,但是由於該指令碼不完整,需要自己進行相應的修改,我google到後發現還是有問題,指令碼中new_master_password這個變數獲取不到,導致線上切換失敗,所以進行了相關的硬編碼,直接把mysql的root使用者密碼賦值給變數new_master_password,如果有哪位大牛知道原因,請指點指點。這個指令碼還可以管理vip。下面貼出指令碼:

 

複製程式碼

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
  $command,          $orig_master_host, $orig_master_ip,
  $orig_master_port, $orig_master_user, 
  $new_master_host,  $new_master_ip,    $new_master_port,
  $new_master_user,  
);


my $vip = '192.168.0.88/24';  # Virtual IP 
my $key = "1"; 
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";
my $ssh_user = "root";
my $new_master_password='123456';
my $orig_master_password='123456';
GetOptions(
  'command=s'              => \$command,
  #'ssh_user=s'             => \$ssh_user,  
  'orig_master_host=s'     => \$orig_master_host,
  'orig_master_ip=s'       => \$orig_master_ip,
  'orig_master_port=i'     => \$orig_master_port,
  'orig_master_user=s'     => \$orig_master_user,
  #'orig_master_password=s' => \$orig_master_password,
  'new_master_host=s'      => \$new_master_host,
  'new_master_ip=s'        => \$new_master_ip,
  'new_master_port=i'      => \$new_master_port,
  'new_master_user=s'      => \$new_master_user,
  #'new_master_password=s'  => \$new_master_password,
);

exit &main();

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      #$orig_master_handler->disable_log_bin_local();
      #print current_time_us() . " Drpping app user on the orig master..\n";
      #FIXME_xxx_drop_app_user($orig_master_handler);

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }



                print "Disabling the VIP on old master: $orig_master_host \n";
                &stop_vip();     


      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      #$orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ([email protected]) {
      warn "Got Error: [email protected]\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      #$new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print current_time_us() . " Creating app user on the new master..\n";
      #FIXME_xxx_create_app_user($new_master_handler);
      #$new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
                print "Enabling the VIP - $vip on the new master - $new_master_host \n";
                &start_vip();
                $exit_code = 0;
    };
    if ([email protected]) {
      warn "Got Error: [email protected]\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

# A simple system call that enable the VIP on the new master 
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
}

複製程式碼

 

 

為了保證資料完全一致性,在最快的時間內完成切換,MHA的線上切換必須滿足以下條件才會切換成功,否則會切換失敗。

1.所有slave的IO執行緒都在執行
2.所有slave的SQL執行緒都在執行
3.所有的show slave status的輸出中Seconds_Behind_Master引數小於或者等於running_updates_limit秒,如果在切換過程中不指定running_updates_limit,那麼預設情況下running_updates_limit為1秒。
4.在master端,通過show processlist輸出,沒有一個更新花費的時間大於running_updates_limit秒。

最後補充一下郵件傳送指令碼send_report 

複製程式碼

#!/usr/bin/perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.163.com';
my $mail_from='xxxx';
my $mail_user='xxxxx';
my $mail_pass='xxxxx';
my $mail_to=['xxxx','xxxx'];
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}



# Do whatever you want here

exit 0;

複製程式碼

總結:

目前高可用方案可以一定程度上實現資料庫的高可用,比如前面文章介紹的MMMheartbeat+drbdCluster等。還有percona的Galera Cluster等。這些高可用軟體各有優劣。在進行高可用方案選擇時,主要是看業務還有對資料一致性方面的要求。最後出於對資料庫的高可用和資料一致性的要求,推薦使用MHA架構。

轉載自:https://www.cnblogs.com/gomysql/p/3675429.html