1. 程式人生 > >分析MHA的manager.log檢視failover的四個階段

分析MHA的manager.log檢視failover的四個階段

MHA  failover5個階段:

* Phase 1: Configuration Check Phase..  檢查配置檔案,選候選slave為新的master
* Phase 2: Dead Master Shutdown Phase.. 強制關閉,並執行master ip漂移
* Phase 3: Master Recovery Phase..      構建新的master
** Phase 3.1: Getting Latest Slaves Phase..    獲得最新的binary log file/position
    The latest binary log file/position on all slaves is mysql_bin.000005:190   最新的binary log file和日誌位置
    The oldest binary log file/position on all slaves is mysql_bin.000005:190   最舊的binary log file和日誌位置
    差異日誌
** Phase 3.2: Saving Dead Master's Binlog Phase..  儲存Dead master's binlog
   Fetching dead master's binary logs..
** Phase 3.3: Determining New Master Phase..       選取New Master
   From:
     centos129 (current master)
        +--centos128
        +--centos130
   To:
     centos128 (new master)
        +--centos130
** Phase 3.3: New Master Diff Log Generation Phase..  New Master差異日誌生成階段
** Phase 3.4: Master Log Apply Phase.. 
   All relay logs were successfully applied.          所有relay log被運用之後,new master的binlog和position資訊:
   Getting new master's binlog name and position..
   mysql_bin.000020:230                               其它的slave 開始複製的日誌點
   vip漂移:
   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=centos129 --orig_master_ip=192.168.44.129 --orig_master_port=3306 --new_master_host=centos128 --new_master_ip=192.168.44.128 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' 
   IN SCRIPT TEST====/usr/sbin/ifconfig ens33:1 down==/usr/sbin/ifconfig ens33:1 192.168.44.211===
* Phase 4: Slaves Recovery Phase..   slave的恢復階段
* Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. 開始並行slave差異日誌的產生階段
* Phase 4.2: Starting Parallel Slave Log Apply Phase..           開始並行slave差異日誌的應用階段
* Phase 5: New master cleanup phase..                            新master的清理階段

通過指令碼masterha_secondary_check  多種線路檢測master是否存活

masterha_secondary_check -s centos129 -s centos128  -user=root --master_host=centos128 --master_ip=192.168.44.128 --master_port=3306  --user=root  --master_host=centos129  --master_ip=192.168.44.129  --master_port=3306
決定是否Failover

Monitoring server centos129 is reachable, Master is not reachable from centos129. OK.
Tue May 22 13:53:54 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.44.129' (111))
Tue May 22 13:53:54 2018 - [warning] Connection failed 1 time(s)..
Monitoring server centos128 is reachable, Master is not reachable from centos128. OK.
Tue May 22 13:53:54 2018 - [info] Master is not reachable from al l other monitoring servers. Failover should start.
Tue May 22 13:53:55 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.44.129' (111))
Tue May 22 13:53:55 2018 - [warning] Connection failed 2 time(s)..
Tue May 22 13:53:56 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.44.129' (111))
Tue May 22 13:53:56 2018 - [warning] Connection failed 3 time(s)..
Tue May 22 13:53:56 2018 - [warning] Master is not reachable from health checker!
Tue May 22 13:53:56 2018 - [warning] Master centos129(192.168.44.129:3306) is not reachable!


讀配置檔案/etc/masterha/app1.cnf,選擇侯選的slave為新的master
 Reading server configurations from /etc/masterha/app1.cnf..
Tue May 22 13:53:57 2018 - [info] Dead Servers:
Tue May 22 13:53:57 2018 - [info]   centos129(192.168.44.129:3306)
Tue May 22 13:53:57 2018 - [info] Alive Servers:
Tue May 22 13:53:57 2018 - [info]   centos128(192.168.44.128:3306)
Tue May 22 13:53:57 2018 - [info]   centos130(192.168.44.130:3306)
Tue May 22 13:53:57 2018 - [info] Alive Slaves:
Tue May 22 13:53:57 2018 - [info]   centos128(192.168.44.128:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:57 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:57 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue May 22 13:53:57 2018 - [info]   centos130(192.168.44.130:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:57 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)

 * Phase 1: Configuration Check Phase..  檢查配置檔案,選候選slave為新的master
Tue May 22 13:53:57 2018 - [info] Tue May 22 13:53:58 2018 - [info] Dead Servers: Tue May 22 13:53:58 2018 - [info] centos129(192.168.44.129:3306) Tue May 22 13:53:58 2018 - [info] Checking master reachability via mysql(double check).. Tue May 22 13:53:58 2018 - [info] ok. Tue May 22 13:53:58 2018 - [info] Alive Servers: Tue May 22 13:53:58 2018 - [info] centos128(192.168.44.128:3306) Tue May 22 13:53:58 2018 - [info] centos130(192.168.44.130:3306) Tue May 22 13:53:58 2018 - [info] Alive Slaves: Tue May 22 13:53:58 2018 - [info] centos128(192.168.44.128:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:53:58 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:53:58 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue May 22 13:53:58 2018 - [info] centos130(192.168.44.130:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:53:58 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:53:58 2018 - [info] ** Phase 1: Configuration Check Phase completed. * Phase 2: Dead Master Shutdown Phase..強制關閉,並執行master ip漂移 Tue May 22 13:53:58 2018 - [info] Tue May 22 13:53:58 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Tue May 22 13:53:58 2018 - [info] Executing master IP deactivatation script: Tue May 22 13:53:58 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=centos129 --orig_master_ip=192.168.44.129 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/usr/sbin/ifconfig ens33:1 down==/usr/sbin/ifconfig ens33:1 192.168.44.211=== Disabling the VIP on old master: centos129 SIOCSIFFLAGS: 鏃犳硶鎸囧畾琚奼傜殑鍦板潃 Tue May 22 13:53:58 2018 - [info] done. Tue May 22 13:53:58 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue May 22 13:53:59 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue May 22 13:53:59 2018 - [info] * Phase 3: Master Recovery Phase.. 構建新的master Tue May 22 13:53:59 2018 - [info] Tue May 22 13:53:59 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. 獲得最新的binary log file/position Tue May 22 13:53:59 2018 - [info] Tue May 22 13:53:59 2018 - [info] The latest binary log file/position on all slaves is mysql_bin.000005:190 最新的binary log file和日誌位置 Tue May 22 13:53:59 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue May 22 13:53:59 2018 - [info] centos128(192.168.44.128:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:53:59 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:53:59 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue May 22 13:53:59 2018 - [info] centos130(192.168.44.130:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:53:59 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:53:59 2018 - [info] The oldest binary log file/position on all slaves is mysql_bin.000005:190 最舊的binary log file和日誌位置 Tue May 22 13:53:59 2018 - [info] Oldest slaves: Tue May 22 13:53:59 2018 - [info] centos128(192.168.44.128:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:53:59 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:53:59 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue May 22 13:53:59 2018 - [info] centos130(192.168.44.130:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:53:59 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:53:59 2018 - [info] Tue May 22 13:53:59 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. 儲存Dead master's binlog Tue May 22 13:53:59 2018 - [info] Tue May 22 13:53:59 2018 - [info] Fetching dead master's binary logs.. Tue May 22 13:53:59 2018 - [info] Executing command on the dead master centos129(192.168.44.129:3306): save_binary_logs --command=save --start_file=mysql_bin.000005 --start_pos=190 --binlog_dir=/data/3306/logs/log --output_file=/tmp/saved_master_binlog_from_centos129_3306_20180522135357.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql_bin.000005 pos 190 to mysql_bin.000005 EOF into /tmp/saved_master_binlog_from_centos129_3306_20180522135357.binlog .. Dumping binlog format description event, from position 0 to 123.. ok. No need to dump effective binlog data from /data/3306/logs/log/mysql_bin.000005 (pos starts 190, filesize 190). Skipping. /tmp/saved_master_binlog_from_centos129_3306_20180522135357.binlog has no effective data events. Event not exists. Tue May 22 13:54:00 2018 - [info] Additional events were not found from the orig master. No need to save. Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] * Phase 3.3: Determining New Master Phase.. 選取New Master Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Tue May 22 13:54:00 2018 - [info] All slaves received relay logs to the same position. No need to resync each other. Tue May 22 13:54:00 2018 - [info] Searching new master from slaves.. Tue May 22 13:54:00 2018 - [info] Candidate masters from the configuration file: Tue May 22 13:54:00 2018 - [info] centos128(192.168.44.128:3306) Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Tue May 22 13:54:00 2018 - [info] Replicating from 192.168.44.129(192.168.44.129:3306) Tue May 22 13:54:00 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue May 22 13:54:00 2018 - [info] Non-candidate masters: Tue May 22 13:54:00 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue May 22 13:54:00 2018 - [info] New master is centos128(192.168.44.128:3306) Tue May 22 13:54:00 2018 - [info] Starting master failover.. Tue May 22 13:54:00 2018 - [info] From: centos129 (current master) +--centos128 +--centos130 To: centos128 (new master) +--centos130 Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] * Phase 3.4: Master Log Apply Phase.. Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Tue May 22 13:54:00 2018 - [info] Starting recovery on centos128(192.168.44.128:3306).. Tue May 22 13:54:00 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. Tue May 22 13:54:00 2018 - [info] done. Tue May 22 13:54:00 2018 - [info] All relay logs were successfully applied. 所有relay log被運用之後,new master的binlog和position資訊: Tue May 22 13:54:00 2018 - [info] Getting new master's binlog name and position.. Tue May 22 13:54:00 2018 - [info] mysql_bin.000020:230 其它的slave 開始複製的日誌點 Tue May 22 13:54:00 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='centos128 or 192.168.44.128', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000020', MASTER_LOG_POS=230, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue May 22 13:54:00 2018 - [info] Executing master IP activate script: Tue May 22 13:54:00 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=centos129 --orig_master_ip=192.168.44.129 --orig_master_port=3306 --new_master_host=centos128 --new_master_ip=192.168.44.128 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' IN SCRIPT TEST====/usr/sbin/ifconfig ens33:1 down==/usr/sbin/ifconfig ens33:1 192.168.44.211=== Enabling the VIP - 192.168.44.211 on the new master - centos128 Undefined subroutine &main::FIXME_xxx_create_user called at /usr/local/bin/master_ip_failover line 120. Set read_only=0 on the new master. Creating app user on the new master.. Tue May 22 13:54:00 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln1261] Failed to activate master IP address for centos128 with return code 10:0 Tue May 22 13:54:00 2018 - [warning] Proceeding. Tue May 22 13:54:00 2018 - [info] ** Finished master recovery successfully. Tue May 22 13:54:00 2018 - [info] * Phase 3: Master Recovery Phase completed. Tue May 22 13:54:00 2018 - [info] * Phase 4: Slaves Recovery Phase.. slave的恢復階段 Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. 開始並行slave差異日誌的產生階段 Tue May 22 13:54:00 2018 - [info] Tue May 22 13:54:00 2018 - [info] -- Slave diff file generation on host centos130(192.168.44.130:3306) started, pid: 100996. Check tmp log /var/log/masterha/app1/centos130_3306_20180522135357.log if it takes time.. Tue May 22 13:54:01 2018 - [info] Tue May 22 13:54:01 2018 - [info] Log messages from centos130 ... Tue May 22 13:54:01 2018 - [info] Tue May 22 13:54:00 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue May 22 13:54:01 2018 - [info] End of log messages from centos130. Tue May 22 13:54:01 2018 - [info] -- centos130(192.168.44.130:3306) has the latest relay log events. Tue May 22 13:54:01 2018 - [info] Generating relay diff files from the latest slave succeeded. Tue May 22 13:54:01 2018 - [info] Tue May 22 13:54:01 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. 開始並行slave差異日誌的應用階段 Tue May 22 13:54:01 2018 - [info] Tue May 22 13:54:01 2018 - [info] -- Slave recovery on host centos130(192.168.44.130:3306) started, pid: 101008. Check tmp log /var/log/masterha/app1/centos130_3306_20180522135357.log if it takes time.. Tue May 22 13:54:02 2018 - [info] Tue May 22 13:54:02 2018 - [info] Log messages from centos130 ... Tue May 22 13:54:02 2018 - [info] Tue May 22 13:54:01 2018 - [info] Starting recovery on centos130(192.168.44.130:3306).. Tue May 22 13:54:01 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. Tue May 22 13:54:01 2018 - [info] done. Tue May 22 13:54:01 2018 - [info] All relay logs were successfully applied. Tue May 22 13:54:01 2018 - [info] Resetting slave centos130(192.168.44.130:3306) and starting replication from the new master centos128(192.168.44.128:3306).. Tue May 22 13:54:01 2018 - [info] Executed CHANGE MASTER. Tue May 22 13:54:01 2018 - [info] Slave started. Tue May 22 13:54:02 2018 - [info] End of log messages from centos130. Tue May 22 13:54:02 2018 - [info] -- Slave recovery on host centos130(192.168.44.130:3306) succeeded. Tue May 22 13:54:02 2018 - [info] All new slave servers recovered successfully. Tue May 22 13:54:02 2018 - [info] Tue May 22 13:54:02 2018 - [info] * Phase 5: New master cleanup phase.. 新master的清理階段 Tue May 22 13:54:02 2018 - [info] Tue May 22 13:54:02 2018 - [info] Resetting slave info on the new master.. Tue May 22 13:54:02 2018 - [info] centos128: Resetting slave info succeeded. Tue May 22 13:54:02 2018 - [info] Master failover to centos128(192.168.44.128:3306) completed successfully. Tue May 22 13:54:02 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover centos129 to centos128 succeeded Master centos129 is down! Check MHA Manager logs at centos128:/var/log/masterha/app1/manager.log for details.