MySQL MHA高可用方案【五、故障切換】
5.1 故障模擬
01:在db01(Master)伺服器上檢視主從複製及mha是否正常
02:停止db01(Master)伺服器上的mysql服務
02:在db04上檢查MHA的日誌(/var/log/mha/app/app1/manager.log)
03:檢視VIP是否飄移到新Master的伺服器上,在新Master上檢視主從複製的狀態
04:在db04伺服器上檢視mha服務的狀態及配置檔案的變化
#在db01(Master)伺服器上檢視主從複製及mha是否正常
[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock
mysql> show processlist\G
*************************** 1. row ***************************
Id: 4
User: rep
Host: 172.16.1.12:36522
db: NULL
Command: Binlog Dump GTID
Time: 14070
State: Master has sent all binlog to slave; waiting for more updates
Info: NULL
*************************** 2. row ***************************
Id: 5
User: rep
Host: 172.16.1.13:59189
db: NULL
Command: Binlog Dump GTID
Time: 13380
State: Master has sent all binlog to slave; waiting for more updates
Info: NULL
*************************** 3. row ***************************
Id: 6
User: rep
Host: 172.16.1.14:22492
db: NULL
Command: Binlog Dump GTID
Time: 12999
State: Master has sent all binlog to slave; waiting for more updates
Info: NULL
*************************** 4. row ***************************
Id: 33
User: mha
Host: 172.16.1.14:22720
db: NULL
Command: Sleep
Time: 2
State:
Info: NULL
*************************** 5. row ***************************
Id: 34
User: root
Host: localhost
db: NULL
Command: Query
Time: 0
State: starting
Info: show processlist
5 rows in set (0.00 sec)
#停止db01(Master)伺服器上的mysql服務
[[email protected] ~]# /data/3306/mysqld stop
MySQL [3306] is not running
[[email protected] ~]# lsof -i :3306
[[email protected] ~]#
#在db04上檢查MHA的日誌(/var/log/mha/app/app1/manager.log)
[[email protected] ~]# tailf /var/log/mha/app/app1/manager.log
Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.1.11(172.16.1.11:3306)
Selected 172.16.1.12(172.16.1.12:3306) as a new master.
172.16.1.12(172.16.1.12:3306): OK: Applying all logs succeeded.
172.16.1.12(172.16.1.12:3306): OK: Activated master IP address.
172.16.1.14(172.16.1.14:3306): OK: Slave started, replicating from 172.16.1.12(172.16.1.12:3306)
172.16.1.13(172.16.1.13:3306): OK: Slave started, replicating from 172.16.1.12(172.16.1.12:3306)
172.16.1.12(172.16.1.12:3306): Resetting slave info succeeded.
Master failover to 172.16.1.12(172.16.1.12:3306) completed successfully.
^=可以看出Master failover(故障轉移)至172.16.1.12伺服器上成功
^=那麼就可以去172.16.1.12伺服器上檢查是否有VIP地址(172.16.1.10)及主從複製的狀態
#檢視VIP是否飄移到新Master的伺服器上,在新Master(db02)上檢視主從複製的狀態
[[email protected] ~]# ifconfig eth1:1
eth1:1 Link encap:Ethernet HWaddr 00:0C:29:D3:59:E8
inet addr:172.16.1.10 Bcast:172.16.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock
mysql> show processlist\G
*************************** 1. row ***************************
Id: 41
User: rep
Host: 172.16.1.14:45688
db: NULL
Command: Binlog Dump GTID
Time: 269
State: Master has sent all binlog to slave; waiting for more updates
Info: NULL
*************************** 2. row ***************************
Id: 42
User: rep
Host: 172.16.1.13:16598
db: NULL
Command: Binlog Dump GTID
Time: 269
State: Master has sent all binlog to slave; waiting for more updates
Info: NULL
*************************** 3. row ***************************
Id: 43
User: root
Host: localhost
db: NULL
Command: Query
Time: 0
State: starting
Info: show processlist
3 rows in set (0.00 sec)
^=從上面可以看出,當前Master(db02)的從庫有172.16.1.13和172.16.1.14這兩臺伺服器
^=mha服務也已經停止了(因為mha成功切換一次主庫,它的服務是會自動停止的)
#在db04伺服器上檢視mha服務的狀態及配置檔案的變化
[[email protected] ~]# ps -ef|grep mha|grep -v grep
[[email protected] ~]#
[[email protected] ~]# cat /etc/mha/app/app1/app1.cnf
[server default]
manager_log=/var/log/mha/app/app1/manager.log
manager_workdir=/var/log/mha/app/app1
master_binlog_dir=/data/3306/binlog
master_ip_failover_script=/server/scripts/master_ip_failover
password=mha
ping_interval=2
repl_password=chenliang
repl_user=rep
ssh_port=921
ssh_user=toor
user=mha
[server2]
hostname=172.16.1.12
port=3306
[server3]
hostname=172.16.1.13
port=3306
[server4]
hostname=172.16.1.14
no_master=1
port=3306
^=從上面可以看出,mha服務已停止(正常的),配置檔案中少了[server1]標籤(正常的)
5.2 故障恢復
01:啟動db01伺服器上的mysql服務
02:在db04伺服器的mha日誌中找到change master語句
03:在db01伺服器操作change master語句,把db01指向為新master(db02)的從庫
04:在mha的配置檔案(在db04伺服器上)中加入[server1]標籤
05:在db04伺服器上的toor普通使用者下再次啟動mha服務
#啟動db01伺服器上的mysql服務
[[email protected] ~]# /data/3306/mysqld start
Start MySQL [3306] [ OK ]
[[email protected] ~]# netstat -lntup|grep mysqld
tcp 0 0 :::3306 :::* LISTEN 6184/mysqld
#在db04伺服器的mha日誌中找到change master語句
[[email protected] ~]# grep -i "change master" /var/log/mha/app/app1/manager.log
Fri Nov 16 14:15:16 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Fri Nov 16 14:15:17 2018 - [info] Executed CHANGE MASTER.
Fri Nov 16 14:15:17 2018 - [info] Executed CHANGE MASTER.
#在db01伺服器操作change master語句,把db01指向為新master(db02)的從庫
[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock
mysql>
mysql> CHANGE MASTER TO MASTER_HOST='172.16.1.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='chenliang';
Query OK, 0 rows affected, 2 warnings (0.07 sec)
mysql> start slave;
Query OK, 0 rows affected (0.06 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.1.12
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: db02_mysql_bin.000003
Read_Master_Log_Pos: 1495
Relay_Log_File: db01_relay_bin.000003
Relay_Log_Pos: 469
Relay_Master_Log_File: db02_mysql_bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1495
Relay_Log_Space: 1294
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 12
Master_UUID: 1386976f-e7b8-11e8-b34b-000c29d359de
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1386976f-e7b8-11e8-b34b-000c29d359de:1-2
Executed_Gtid_Set: 1386976f-e7b8-11e8-b34b-000c29d359de:1-2,
3ad8129b-e7b2-11e8-817e-000c296b2e4b:1-6
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.03 sec)
#在mha的配置檔案(在db04伺服器上)中加入[server1]標籤
[[email protected] ~]# cat /etc/mha/app/app1/app1.cnf
[server default]
manager_log=/var/log/mha/app/app1/manager.log
manager_workdir=/var/log/mha/app/app1
master_binlog_dir=/data/3306/binlog
master_ip_failover_script=/server/scripts/master_ip_failover
password=mha
ping_interval=2
repl_password=chenliang
repl_user=rep
ssh_port=921
ssh_user=toor
user=mha
[server1]
hostname=172.16.1.11
port=3306
[server2]
hostname=172.16.1.12
port=3306
[server3]
hostname=172.16.1.13
port=3306
[server4]
hostname=172.16.1.14
no_master=1
port=3306
#在db04伺服器上的toor普通使用者下再次啟動mha服務
[[email protected] ~]# su - toor
[toor[email protected] ~]$ masterha_check_ssh --conf=/etc/mha/app/app1/app1.cnf
.........................
Fri Nov 16 14:56:42 2018 - [info] All SSH connection tests passed successfully.
[[email protected] ~]$ masterha_check_repl --conf=/etc/mha/app/app1/app1.cnf
........
172.16.1.12(172.16.1.12:3306) (current master)
+--172.16.1.11(172.16.1.11:3306)
+--172.16.1.13(172.16.1.13:3306)
+--172.16.1.14(172.16.1.14:3306)
..........
MySQL Replication Health is OK
[[email protected] ~]$ ps -ef|grep mha|grep -v grep
toor 6349 1 4 14:58 pts/0 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app/app1/app1.cnf --remove_dead_master_conf --ignore_last_failover
5.3 再次提升原master為主庫
[[email protected] ~]# /data/3306/mysqld stop
Stop MySQL[3306]
[[email protected] ~]# grep -i "change master" /var/log/mha/app/app1/manager.log
Fri Nov 16 15:50:29 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.11', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Fri Nov 16 15:50:29 2018 - [info] Executed CHANGE MASTER.
Fri Nov 16 15:50:29 2018 - [info] Executed CHANGE MASTER.
[[email protected] ~]# ifconfig eth1:1
eth1:1 Link encap:Ethernet HWaddr 00:0C:29:6B:2E:55
inet addr:172.16.1.10 Bcast:172.16.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
[[email protected] ~]# /data/3306/mysqld start
Start MySQL [3306] [ OK ]
[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock
mysql>mysql> CHANGE MASTER TO MASTER_HOST='172.16.1.11', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='chenliang';
Query OK, 0 rows affected, 2 warnings (0.07 sec)
mysql> start slave;
Query OK, 0 rows affected (0.20 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.1.11
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: db01_mysql_bin.000010
Read_Master_Log_Pos: 234
Relay_Log_File: db02_relay_bin.000002
Relay_Log_Pos: 377
Relay_Master_Log_File: db01_mysql_bin.000010
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 234
Relay_Log_Space: 583
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 11
Master_UUID: 3ad8129b-e7b2-11e8-817e-000c296b2e4b
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 1386976f-e7b8-11e8-b34b-000c29d359de:1-2,
3ad8129b-e7b2-11e8-817e-000c296b2e4b:1-6
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
[[email protected] ~]# cat /etc/mha/app/app1/app1.cnf
[server default]
manager_log=/var/log/mha/app/app1/manager.log
manager_workdir=/var/log/mha/app/app1
master_binlog_dir=/data/3306/binlog
master_ip_failover_script=/server/scripts/master_ip_failover
password=mha
ping_interval=2
repl_password=chenliang
repl_user=rep
ssh_port=921
ssh_user=toor
user=mha
[server1]
hostname=172.16.1.11
port=3306
[server2]
hostname=172.16.1.12
port=3306
[server3]
hostname=172.16.1.13
port=3306
[server4]
hostname=172.16.1.14
no_master=1
port=3306
[[email protected] ~]# su - toor
[[email protected] ~]$ mha
[[email protected] ~]$ ps -ef|grep mha
toor 9310 1 16 15:56 pts/0 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app/app1/app1.cnf --remove_dead_master_conf --ignore_last_failover
toor 9322 9290 0 15:56 pts/0 00:00:00 grep --color=auto mha