1. 前言
mysql Group Replication(後面稱為MGR)GA版本已經出了1個多月,經過一輪簡單測試,發現MGR對數據最終一致性(註意,是數據最終一致性,實時?做不到,還是有一定延遲)的保證還是挺靠譜的。通過前面的文章對MGR的介紹,MGR有兩種模式,單主和多主,針對當前多主的限制,以及測試中發現的一些問題,多主的實用性應該還不大。
實際上,通過MGR單主模式已經可以消除傳統的主從架構故障切換後數據可能不一致的隱患,除此之外還有一個問題,就是HA,如何做到MGR主節點故障自動切換?
我們知道,MGR單主模式下,當主節點故障,MGR內部將會發起一輪選舉,選出新的主,這是由MGR內部決定並執行的。但是,MGR並沒有為我們考慮周全,應用的連接遇到主節點掛掉的情況,是不會自動發生切換的。也就是說,MGR內部沒有提供一種機制,來實現主節點故障切換對應用無感知。
來自MGR官方文檔的描述:
Quite obviously, regardless the mode Group Replication is deployed, it does not handle client-side fail-over. That must be handled by the application itself, connector or a middleware framework such as a proxy or router.
意思就是說,我們並不能幫你處理客戶端的故障切換,這事得由我們應用自己來!又或者是依靠中間件、proxy這類軟件!
在實際應用中,我們當然希望,主節點掛掉,應用無需重啟,自動能夠將連接重置到新的主上,繼續提供服務。基於這方面考慮,趕緊在Google上面檢索相關資料,受到了下面這篇博客的啟發:
http://lefred.be/content/ha-with-mysql-group-replication-and-proxysql/
可以通過一款MySQL中間件ProySQL來解決上面提到的問題。
2. 目標
說了這麽多,簡述下我們的目標:
MGR單主模式下,實現主節點故障切換,對應用無感知
目標的實現,需要依賴ProxySQL這款中間件。
3. 實現思路
前面說到,可以用ProySQL來實現我們的目標。文字不多說,先來張實現思路圖:
描述下上面的實現思路:應用通過連接ProxSQL中間件,間接訪問後端MGR主節點。ProxySQL內部有配置表,可以維護MGR節點的訪問信息,並開啟調度器調度檢查腳本(Shell腳本),定期巡檢後端MGR節點狀態,若發現MGR主節點掛掉,ProxySQL調度腳本檢測到這個錯誤,從而確定新的主節點,將原先持有的舊的連接丟棄,產生新節點的連接(ProxySQL內部會維護和後端MGR各個節點的連接,多源連接池的概念)。
上述的整個過程中,應用無需任何變動。應用從意識發生了故障,到連接重新指向新的主,正常提供服務,秒級別的間隔。
【重要】 腳本的校驗邏輯,如下面偽代碼所示:
set flag switchOver = false;
find current write node;
for( read node in readhostgroup) {
isOk = check read node status;
if(read node is current write node) {
if(!isOk) {
// need to find new write node
set flag switchOver = 1;
update current read node status to be 'OFFLINE_SOFT';
update current write node status to be 'OFFLINE_HARD';
} else {
// is ok
isPrimaryNode = check current write node is really the primary node;
if(!isPrimaryNode) {
// need to find new write node
set flag switchOver = true;
update current write node status to be 'OFFLINE_HARD';
if(read node status != 'ONLINE') {
update read node status to be 'ONLINE';
}
continue;
}
// is primary node
if(read node status != 'ONLINE') {
update current write node status to be 'ONLINE';
update read node status to be 'ONLINE';
}
}
} else if(!isOk) { // node is not current node and status is not ok
update read node status to be 'OFFLINE_SOFT';
} else if(isOk and read node status == 'OFFLINE_SOFT') {
update read node status to be 'ONLINE';
}
}
if(switchOver) {
// need to find new write node
successSwitchOver = false;
for(read node in readgroup and status is 'ONLINE') {
isNewPrimaryNode = check node is the new primary node;
if(isNewPrimaryNode) {
update current write node info as read node;
successSwitchOver = true;
break;
}
}
if(!successSwitchOver) {
// can not find the new write node
report error msg;
}
}
4. 具體實施
下面介紹如何配置ProxySQL和MGR共同作用來達成我們的目標。
4.1 環境
- ubuntu 14.04 (CentOS6.5也部署過)
- MySQL 5.7.17
- ProxySQL 1.3.1
相關軟件的安裝部署不在此文考慮範圍(包括MRG的搭建)
4.2 配置
假設我們在一臺機器(機器資源有限)上已經部署了一個MGR 3節點集群,模式為單主模式:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 4a48f63a-d47b-11e6-a16f-a434d920fb4d | CrazyPig-PC | 24801 | ONLINE |
| group_replication_applier | 592b4ea5-d47b-11e6-a3cd-a434d920fb4d | CrazyPig-PC | 24802 | ONLINE |
| group_replication_applier | 6610aa92-d47b-11e6-a60e-a434d920fb4d | CrazyPig-PC | 24803 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)
並且,在這臺機器上,成功安裝ProxySQL並啟動。接下來,有以下配置工作:
1) MGR集群創建相關用戶並授權
為了能夠讓proxysql定期檢查MGR節點狀態,以及能夠作為後端MGR代理層對外提供服務,必須在proxysql創建登陸MGR節點的相應用戶並授權:
在MGR主節點上執行:
grant all privileges on *.* to 'proxysql'@'%' identified by 'proxysql';
flush privileges;
2) 創建檢查MGR節點狀態的函數和視圖
參照前面的博客,在MGR主節點上執行下面鏈接中的SQL:
https://github.com/lefred/mysql_gr_routing_check/blob/master/addition_to_sys.sql
3) 配置proxysql
添加MGR成員節點到proxysql mysql_servers
表:
insert into mysql_servers (hostgroup_id, hostname, port) values(1, '127.0.0.1', 24801);
insert into mysql_servers (hostgroup_id, hostname, port) values(2, '127.0.0.1', 24801);
insert into mysql_servers (hostgroup_id, hostname, port) values(2, '127.0.0.1', 24802);
insert into mysql_servers (hostgroup_id, hostname, port) values(2, '127.0.0.1', 24803);
hostgroup_id = 1
代表write group,針對我們提出的限制,這個地方只配置了一個節點;hostgroup_id = 2
代表read group,包含了MGR的所有節點。
proxysql還可以配置讀寫分離,本文不考慮這個特性的配置。對於上面的hostgroup配置,所有的讀寫操作,默認會發送到hostgroup_id為1的hostgroup,也就是發送到寫節點上。
接下來我們需要修改proxysql的監控用戶和密碼為我們上面 步驟 1) 提供的用戶和密碼。
UPDATE global_variables SET variable_value=http://www.ithao123.cn/'proxysql' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value=http://www.ithao123.cn/'proxysql' WHERE variable_name='mysql-monitor_password';
並且,添加應用通過proxysql訪問後端MGR節點的用戶:
insert into mysql_users(username, password) values('proxysql', 'proxysql');
最後我們需要將global_variables
,mysql_servers
、mysql_users
表的信息加載到RUNTIME,更進一步加載到DISK:
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;
4) 配置scheduler
首先,請在Github地址https://github.com/ZzzCrazyPig/proxysql_groupreplication_checker下載gr_sw_mode_checker.sh
接著,將我們提供的腳本gr_sw_mode_cheker.sh
放到目錄/var/lib/proxysql/
下
最後,我們在proxysql的scheduler表裏面加載如下記錄,然後加載到RUNTIME使其生效,同時還可以持久化到磁盤:
insert into scheduler(id, active, interval_ms, filename, arg1, arg2, arg3, arg4)
values(1, 1, 3000, '/var/lib/proxysql/gr_sw_mode_checker.sh', 1, 2, 1, '/var/lib/proxysql/checker.log');
LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;
- active : 1: 表示使腳本調度生效
- interval_ms : 每隔多長時間執行一次腳本 (eg: 3000(ms) = 3s 表示每隔3s腳本被調用一次)
- filename: 指定腳本所在的具體路徑,如上面的
/var/lib/proxysql/checker.log
- arg1~arg4: 指定傳遞給腳本的參數
腳本及對應的參數說明如下:
gr_sw_mode_cheker.sh writehostgroup_id readhostgroup_id [writeNodeCanRead] [log file]
- arg1 -> 指定writehostgroup_id
- arg2 -> 指定readhostgroup_id
- arg3 -> 寫節點是否可以用於讀, 1(YES, the default value), 0(NO)
- arg4 -> log file, default:
'./checker.log'
好的,到這裏就大功告成了。
5. 測試
由於我是搞Java的,就想著寫一段Java程序,通過JDBC連接,連接ProxySQL(註意,客戶端應該連接的是ProxySQL),執行select @@port
查看當前連著的是後端MGR哪個節點,然後手動模擬主機掛掉的情況,觀察現象。代碼如下:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class TestMgrHAWithProxysql {
private static final String JDBC_URL = "jdbc:mysql://10.202.7.88:6033/test";
private static final String USER = "proxysql";
private static final String PASSWORD = "proxysql";
public static void main(String[] args) {
tryAgain();
}
private static void tryAgain() {
Connection conn = null;
try {
conn = DriverManager.getConnection(JDBC_URL, USER, PASSWORD);
conn.setAutoCommit(false);
String sql = "select @@port";
Statement stmt = conn.createStatement();
while(true) {
ResultSet rs = stmt.executeQuery(sql);
if(rs.next()) {
system.out.println("port : " + rs.getString(1));
}
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
} catch(SQLException e) {
e.printStackTrace();
tryAgain();
} finally {
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
}
初始MGR主節點所在端口為24802,所以程序一直輸出:
port : 24802
...
...
...
程序一直運行,模擬掛掉主節點的情況:
mysql> stop group_replication;
Query OK, 0 rows affected (8.34 sec)
此時java程序輸出異常,然後持續輸出:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 567 milliseconds ago. The last packet sent successfully to the server was 66 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendcommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
at TestMgrHAWithProxysql.tryAgain(TestMgrHAWithProxysql.java:26)
at TestMgrHAWithProxysql.main(TestMgrHAWithProxysql.java:15)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 9 more
port : 24803
...
...
這時候主已經切換到24803端口了,也就是第3個節點。再次驗證,先把之前剔除掉的24802端口節點重新加回去:
mysql> start group_replication;
Query OK, 0 rows affected (2.64 sec)
然後登陸24803端口節點,執行:
mysql> stop group_replication;
Query OK, 0 rows affected (8.15 sec)
主繼續切換到24802:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 547 milliseconds ago. The last packet sent successfully to the server was 47 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
at TestMgrHAWithProxysql.tryAgain(TestMgrHAWithProxysql.java:26)
at TestMgrHAWithProxysql.main(TestMgrHAWithProxysql.java:15)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 9 more
port : 24802
...
...
這次我們不把24803重新加入,而是再次stop 24802,讓整個MGR只有1個節點,看是否會選擇24801端口,答案是肯定的:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 542 milliseconds ago. The last packet sent successfully to the server was 41 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
at TestMgrHAWithProxysql.tryAgain(TestMgrHAWithProxysql.java:26)
at TestMgrHAWithProxysql.main(TestMgrHAWithProxysql.java:15)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 9 more
port : 24801
...
...
當然,此時如果連24801端口也退出,那麽就不用玩了!
6. 引入ProxySQL的影響
本套方案是通過引入中間件,來解決MGR單主模式下主發生故障切換,對應用無感知。有以下影響:
- 引入中間件proxysql,增加機器資源的同時增加了運維難度和復雜度
- proxysql存在單點,如果要做到高可用,需要考慮更多(HAProxy?KeepAlived?)
7. 參考資源
7.1 ProxySQL
- For DBA - MySQL中間件之ProxySQL安裝部署篇
- For DBA - MySQL中間件之ProxySQL_配置系統
- For DBA - MySQL中間件之ProxySQL_讀寫分離/查詢重寫配置
- ProxySQL 官網
- ProxySQL Github(看Wiki和doc目錄下的文檔)
7.2 MGR HA
- lefred’ Blog - MGR HA with HAProxy
- lefred’ Blog - MGR HA with ProxySQL
7.3 本文相關
- My Github - MGR HA with ProxySQL
Tags: framework connector deployed Google 實際應用
文章來源: