1. 程式人生 > >MySQL MHA高可用方案【五、故障切換】

MySQL MHA高可用方案【五、故障切換】

5.1 故障模擬

01:在db01(Master)伺服器上檢視主從複製及mha是否正常

02:停止db01(Master)伺服器上的mysql服務

02:在db04上檢查MHA的日誌(/var/log/mha/app/app1/manager.log)

03:檢視VIP是否飄移到新Master的伺服器上,在新Master上檢視主從複製的狀態

04:在db04伺服器上檢視mha服務的狀態及配置檔案的變化

#在db01(Master)伺服器上檢視主從複製及mha是否正常

[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock

mysql> show processlist\G

*************************** 1. row ***************************

Id: 4

User: rep

Host: 172.16.1.12:36522

db: NULL

Command: Binlog Dump GTID

Time: 14070

State: Master has sent all binlog to slave; waiting for more updates

Info: NULL

*************************** 2. row ***************************

Id: 5

User: rep

Host: 172.16.1.13:59189

db: NULL

Command: Binlog Dump GTID

Time: 13380

State: Master has sent all binlog to slave; waiting for more updates

Info: NULL

*************************** 3. row ***************************

Id: 6

User: rep

Host: 172.16.1.14:22492

db: NULL

Command: Binlog Dump GTID

Time: 12999

State: Master has sent all binlog to slave; waiting for more updates

Info: NULL

*************************** 4. row ***************************

Id: 33

User: mha

Host: 172.16.1.14:22720

db: NULL

Command: Sleep

Time: 2

State:

Info: NULL

*************************** 5. row ***************************

Id: 34

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: starting

Info: show processlist

5 rows in set (0.00 sec)


#停止db01(Master)伺服器上的mysql服務

[[email protected] ~]# /data/3306/mysqld stop

MySQL [3306] is not running

[[email protected] ~]# lsof -i :3306

[[email protected] ~]#


#在db04上檢查MHA的日誌(/var/log/mha/app/app1/manager.log)

[[email protected] ~]# tailf /var/log/mha/app/app1/manager.log

Started automated(non-interactive) failover.

Invalidated master IP address on 172.16.1.11(172.16.1.11:3306)

Selected 172.16.1.12(172.16.1.12:3306) as a new master.

172.16.1.12(172.16.1.12:3306): OK: Applying all logs succeeded.

172.16.1.12(172.16.1.12:3306): OK: Activated master IP address.

172.16.1.14(172.16.1.14:3306): OK: Slave started, replicating from 172.16.1.12(172.16.1.12:3306)

172.16.1.13(172.16.1.13:3306): OK: Slave started, replicating from 172.16.1.12(172.16.1.12:3306)

172.16.1.12(172.16.1.12:3306): Resetting slave info succeeded.

Master failover to 172.16.1.12(172.16.1.12:3306) completed successfully.


^=可以看出Master failover(故障轉移)至172.16.1.12伺服器上成功

^=那麼就可以去172.16.1.12伺服器上檢查是否有VIP地址(172.16.1.10)及主從複製的狀態


#檢視VIP是否飄移到新Master的伺服器上,在新Master(db02)上檢視主從複製的狀態

[[email protected] ~]# ifconfig eth1:1

eth1:1 Link encap:Ethernet HWaddr 00:0C:29:D3:59:E8

inet addr:172.16.1.10 Bcast:172.16.1.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock

mysql> show processlist\G

*************************** 1. row ***************************

Id: 41

User: rep

Host: 172.16.1.14:45688

db: NULL

Command: Binlog Dump GTID

Time: 269

State: Master has sent all binlog to slave; waiting for more updates

Info: NULL

*************************** 2. row ***************************

Id: 42

User: rep

Host: 172.16.1.13:16598

db: NULL

Command: Binlog Dump GTID

Time: 269

State: Master has sent all binlog to slave; waiting for more updates

Info: NULL

*************************** 3. row ***************************

Id: 43

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: starting

Info: show processlist

3 rows in set (0.00 sec)

^=從上面可以看出,當前Master(db02)的從庫有172.16.1.13和172.16.1.14這兩臺伺服器

^=mha服務也已經停止了(因為mha成功切換一次主庫,它的服務是會自動停止的)



#在db04伺服器上檢視mha服務的狀態及配置檔案的變化

[[email protected] ~]# ps -ef|grep mha|grep -v grep

[[email protected] ~]#

[[email protected] ~]# cat /etc/mha/app/app1/app1.cnf

[server default]

manager_log=/var/log/mha/app/app1/manager.log

manager_workdir=/var/log/mha/app/app1

master_binlog_dir=/data/3306/binlog

master_ip_failover_script=/server/scripts/master_ip_failover

password=mha

ping_interval=2

repl_password=chenliang

repl_user=rep

ssh_port=921

ssh_user=toor

user=mha


[server2]

hostname=172.16.1.12

port=3306


[server3]

hostname=172.16.1.13

port=3306


[server4]

hostname=172.16.1.14

no_master=1

port=3306

^=從上面可以看出,mha服務已停止(正常的),配置檔案中少了[server1]標籤(正常的)

5.2 故障恢復

01:啟動db01伺服器上的mysql服務

02:在db04伺服器的mha日誌中找到change master語句

03:在db01伺服器操作change master語句,把db01指向為新master(db02)的從庫

04:在mha的配置檔案(在db04伺服器上)中加入[server1]標籤

05:在db04伺服器上的toor普通使用者下再次啟動mha服務


#啟動db01伺服器上的mysql服務

[[email protected] ~]# /data/3306/mysqld start

Start MySQL [3306] [ OK ]

[[email protected] ~]# netstat -lntup|grep mysqld

tcp 0 0 :::3306 :::* LISTEN 6184/mysqld


#在db04伺服器的mha日誌中找到change master語句

[[email protected] ~]# grep -i "change master" /var/log/mha/app/app1/manager.log

Fri Nov 16 14:15:16 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';

Fri Nov 16 14:15:17 2018 - [info] Executed CHANGE MASTER.

Fri Nov 16 14:15:17 2018 - [info] Executed CHANGE MASTER.


#在db01伺服器操作change master語句,把db01指向為新master(db02)的從庫

[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock

mysql>

mysql> CHANGE MASTER TO MASTER_HOST='172.16.1.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='chenliang';

Query OK, 0 rows affected, 2 warnings (0.07 sec)

mysql> start slave;

Query OK, 0 rows affected (0.06 sec)

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 172.16.1.12

Master_User: rep

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: db02_mysql_bin.000003

Read_Master_Log_Pos: 1495

Relay_Log_File: db01_relay_bin.000003

Relay_Log_Pos: 469

Relay_Master_Log_File: db02_mysql_bin.000003

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 1495

Relay_Log_Space: 1294

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 12

Master_UUID: 1386976f-e7b8-11e8-b34b-000c29d359de

Master_Info_File: mysql.slave_master_info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: 1386976f-e7b8-11e8-b34b-000c29d359de:1-2

Executed_Gtid_Set: 1386976f-e7b8-11e8-b34b-000c29d359de:1-2,

3ad8129b-e7b2-11e8-817e-000c296b2e4b:1-6

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.03 sec)


#在mha的配置檔案(在db04伺服器上)中加入[server1]標籤

[[email protected] ~]# cat /etc/mha/app/app1/app1.cnf

[server default]

manager_log=/var/log/mha/app/app1/manager.log

manager_workdir=/var/log/mha/app/app1

master_binlog_dir=/data/3306/binlog

master_ip_failover_script=/server/scripts/master_ip_failover

password=mha

ping_interval=2

repl_password=chenliang

repl_user=rep

ssh_port=921

ssh_user=toor

user=mha


[server1]

hostname=172.16.1.11

port=3306


[server2]

hostname=172.16.1.12

port=3306


[server3]

hostname=172.16.1.13

port=3306


[server4]

hostname=172.16.1.14

no_master=1

port=3306


#在db04伺服器上的toor普通使用者下再次啟動mha服務

[[email protected] ~]# su - toor

[toor[email protected] ~]$ masterha_check_ssh --conf=/etc/mha/app/app1/app1.cnf

.........................

Fri Nov 16 14:56:42 2018 - [info] All SSH connection tests passed successfully.

[[email protected] ~]$ masterha_check_repl --conf=/etc/mha/app/app1/app1.cnf

........

172.16.1.12(172.16.1.12:3306) (current master)

+--172.16.1.11(172.16.1.11:3306)

+--172.16.1.13(172.16.1.13:3306)

+--172.16.1.14(172.16.1.14:3306)

..........

MySQL Replication Health is OK

[[email protected] ~]$ ps -ef|grep mha|grep -v grep

toor 6349 1 4 14:58 pts/0 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app/app1/app1.cnf --remove_dead_master_conf --ignore_last_failover

5.3 再次提升原master為主庫

[[email protected] ~]# /data/3306/mysqld stop

Stop MySQL[3306]

[[email protected] ~]# grep -i "change master" /var/log/mha/app/app1/manager.log

Fri Nov 16 15:50:29 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.11', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';

Fri Nov 16 15:50:29 2018 - [info] Executed CHANGE MASTER.

Fri Nov 16 15:50:29 2018 - [info] Executed CHANGE MASTER.

[[email protected] ~]# ifconfig eth1:1

eth1:1 Link encap:Ethernet HWaddr 00:0C:29:6B:2E:55

inet addr:172.16.1.10 Bcast:172.16.1.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

[[email protected] ~]# /data/3306/mysqld start

Start MySQL [3306] [ OK ]

[[email protected] ~]# mysql -uroot -pchenliang -S /data/3306/mysql.sock

mysql>

mysql> CHANGE MASTER TO MASTER_HOST='172.16.1.11', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='chenliang';

Query OK, 0 rows affected, 2 warnings (0.07 sec)


mysql> start slave;

Query OK, 0 rows affected (0.20 sec)


mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 172.16.1.11

Master_User: rep

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: db01_mysql_bin.000010

Read_Master_Log_Pos: 234

Relay_Log_File: db02_relay_bin.000002

Relay_Log_Pos: 377

Relay_Master_Log_File: db01_mysql_bin.000010

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 234

Relay_Log_Space: 583

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 11

Master_UUID: 3ad8129b-e7b2-11e8-817e-000c296b2e4b

Master_Info_File: mysql.slave_master_info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set: 1386976f-e7b8-11e8-b34b-000c29d359de:1-2,

3ad8129b-e7b2-11e8-817e-000c296b2e4b:1-6

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.00 sec)


[[email protected] ~]# cat /etc/mha/app/app1/app1.cnf

[server default]

manager_log=/var/log/mha/app/app1/manager.log

manager_workdir=/var/log/mha/app/app1

master_binlog_dir=/data/3306/binlog

master_ip_failover_script=/server/scripts/master_ip_failover

password=mha

ping_interval=2

repl_password=chenliang

repl_user=rep

ssh_port=921

ssh_user=toor

user=mha


[server1]

hostname=172.16.1.11

port=3306


[server2]

hostname=172.16.1.12

port=3306


[server3]

hostname=172.16.1.13

port=3306


[server4]

hostname=172.16.1.14

no_master=1

port=3306


[[email protected] ~]# su - toor

[[email protected] ~]$ mha

[[email protected] ~]$ ps -ef|grep mha

toor 9310 1 16 15:56 pts/0 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app/app1/app1.cnf --remove_dead_master_conf --ignore_last_failover

toor 9322 9290 0 15:56 pts/0 00:00:00 grep --color=auto mha