Zookeeper筆記之使用zk實現叢集選主
一、需求
在主從結構的叢集中,我們假設硬體機器是很脆弱的,隨時可能會宕機,當master掛掉之後需要從slave中選出一個節點作為新的master,使用zookeeper可以很簡單的實現叢集選主功能。
二、分析
叢集選主涉及到兩個問題:
1. 誰來做leader
2. leader掛掉了怎麼被follower感知到
首先是第一個問題,誰來做leader,其實可以將這個問題看做是多執行緒中的互斥鎖搶佔,鎖只有一把,並且只能被一個人搶到,這裡就把一個zookeeper上的一個節點/leader-info看做是鎖,叢集中的每臺機器都嘗試去建立這個節點,因為zookeeper建立節點是原子性操作,所以只有一臺機器能夠建立成功其它都會失敗,建立成功的那臺機器就作為leader,其它機器做follower,一般還會在/leader-info節點上儲存一些leader相關的資訊,以讓follower去連線leader進行一些資料交換或指令控制之類的,那就是選主之後的事了不在此篇文章的討論範圍之內。
第二個問題是leader掛掉了怎麼通知其它的follower,zookeeper中的節點按照有效時間分為持久節點和臨時節點,臨時節點跟session繫結,當session失效的時候它建立的臨時節點就會被刪除,利用這個特性可以檢測到節點是否還在存活狀態,實現follower對leader下線的感知,只需要在建立/leader-info節點的時候將其建立為臨時節點,然後follower在這個節點上新增一個watcher監聽其刪除事件,這樣當leader掛掉的時候zookeepr會將/leader-info節點刪除,同時給所有的follower傳送事件通知,follower一看leader掛了就燥起來了,將自己的狀態置為looking,開始新一輪的選舉。
總結一下選主的流程:
1. 叢集中的所有機器將自己置為looking狀態,準備開始選舉。
2. 所有looking狀態的機器嘗試去建立/leader-info節點。
3. 建立成功的將自己的狀態修改為leader,同時將自己的一些資訊寫入到這個節點上;建立失敗的將自己的狀態置為follower,同時嘗試從/leader-info獲取leader資訊進行一些leader改變的邏輯。
4. 在follower去獲取/leader-info節點的資料的時候,是有可能報KeeperException.NoNodeException異常的,因為leader剛成為leader就掛掉了(或者因為一些網路抖動原因,總之是session失效了),這個時候follower檢測到KeeperException.NoNodeException,說明叢集中已經沒有了leader,將自己的狀態置為looking開始新一輪的選舉。
三、實現
Node.java:
package cc11001100.zookeeper.leaderElection; import cc11001100.zookeeper.utils.ZooKeeperUtil; import org.apache.zookeeper.CreateMode; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.ZooDefs; import org.apache.zookeeper.ZooKeeper; import java.io.IOException; import java.io.UnsupportedEncodingException; /** * 表示叢集中的一個節點,會通過選舉決定自己是leader還是follower * * @author CC11001100 */ public class Node { private Status status; private String nodeForLeaderInfo; private ZooKeeper zooKeeper; public Node(String listenerNodeForLeader) throws IOException { this.nodeForLeaderInfo = listenerNodeForLeader; this.zooKeeper = ZooKeeperUtil.getZooKeeper(); lookingForLeader(); } public void lookingForLeader() { status = Status.LOOKING; try { String leaderInfo = Thread.currentThread().getName(); zooKeeper.create(nodeForLeaderInfo, leaderInfo.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL); // 如果上一步沒有拋異常,說明自己已經是leader了 status = Status.LEADER; String logMsg = Thread.currentThread().getName() + " is leader"; System.out.println(logMsg); } catch (KeeperException.NodeExistsException e) { // 節點已經存在,說明leader已經被別人註冊成功了,自己是follower status = Status.FOLLOWER; try { byte[] leaderInfoBytes = zooKeeper.getData(nodeForLeaderInfo, event -> { if (event.getType() == Watcher.Event.EventType.NodeDeleted) { lookingForLeader(); } }, null); String logMsg = Thread.currentThread().getName() + " is follower, master is " + new String(leaderInfoBytes, "UTF-8"); System.out.println(logMsg); } catch (KeeperException.NoNodeException e1) { // 如果在獲取leader資訊的時候報了節點不存在,說明這個leader比較短命,剛搶到leader就又掛掉了 lookingForLeader(); } catch (KeeperException | InterruptedException | UnsupportedEncodingException e1) { e1.printStackTrace(); } } catch (KeeperException | InterruptedException e) { e.printStackTrace(); } } public void shutdown() { try { if (zooKeeper != null) { zooKeeper.close(); } } catch (InterruptedException e) { e.printStackTrace(); } } public Status getStatus() { return status; } // 當前節點的身份 public enum Status { LOOKING, // 選舉中 LEADER, // 選舉完畢,當前節點為leader FOLLOWER; // 選舉完畢,當前節點為follower } }
LeaderElectionTest.java:
package cc11001100.zookeeper.leaderElection; import java.io.IOException; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicLong; /** * @author CC11001100 */ public class LeaderElectionTest { private static void sleep(long mils) { try { TimeUnit.MILLISECONDS.sleep(mils); } catch (InterruptedException e) { e.printStackTrace(); } } public static void main(String[] args) throws IOException { final String LEADER_INFO_NODE = "/leader-info"; int nodeNum = 10; AtomicLong idGenerator = new AtomicLong(); AtomicInteger activeNodeCount = new AtomicInteger(); while (true) { if (activeNodeCount.get() >= nodeNum) { sleep(10); continue; } // 執行緒啟動需要一定時間,將執行緒啟動看做開機過程,在開機之前就算一臺新的機器加入了 activeNodeCount.incrementAndGet(); new Thread(() -> { try { Node node = new Node(LEADER_INFO_NODE); while (true) { sleep(1000); // 這裡為了試驗就讓leader有輕微自殺傾向... if (node.getStatus() == Node.Status.LEADER && Math.random() < 0.3) { String logMsg = "----------------------------- " + Thread.currentThread().getName() + " shutdown -----------------------------"; System.out.println(logMsg); node.shutdown(); break; } } } catch (IOException e) { e.printStackTrace(); } finally { activeNodeCount.decrementAndGet(); } }, "node-" + idGenerator.getAndIncrement()).start(); } } }
控制檯輸出:
... node-4 is leader node-3 is follower, master is node-4 node-0 is follower, master is node-4 node-9 is follower, master is node-4 node-7 is follower, master is node-4 node-5 is follower, master is node-4 node-1 is follower, master is node-4 node-6 is follower, master is node-4 node-8 is follower, master is node-4 node-2 is follower, master is node-4 ----------------------------- node-4 shutdown ----------------------------- node-0-EventThread is leader node-6-EventThread is follower, master is node-0-EventThread node-3-EventThread is follower, master is node-0-EventThread node-7-EventThread is follower, master is node-0-EventThread node-1-EventThread is follower, master is node-0-EventThread node-5-EventThread is follower, master is node-0-EventThread node-9-EventThread is follower, master is node-0-EventThread node-2-EventThread is follower, master is node-0-EventThread node-8-EventThread is follower, master is node-0-EventThread node-10 is follower, master is node-0-EventThread ----------------------------- node-0 shutdown ----------------------------- node-6-EventThread is leader node-7-EventThread is follower, master is node-6-EventThread node-1-EventThread is follower, master is node-6-EventThread node-3-EventThread is follower, master is node-6-EventThread node-10-EventThread is follower, master is node-6-EventThread node-9-EventThread is follower, master is node-6-EventThread node-5-EventThread is follower, master is node-6-EventThread node-2-EventThread is follower, master is node-6-EventThread node-8-EventThread is follower, master is node-6-EventThread node-11 is follower, master is node-6-EventThread ----------------------------- node-6 shutdown ----------------------------- node-1-EventThread is leader node-10-EventThread is follower, master is node-1-EventThread node-7-EventThread is follower, master is node-1-EventThread node-11-EventThread is follower, master is node-1-EventThread node-8-EventThread is follower, master is node-1-EventThread node-5-EventThread is follower, master is node-1-EventThread node-9-EventThread is follower, master is node-1-EventThread node-3-EventThread is follower, master is node-1-EventThread node-2-EventThread is follower, master is node-1-EventThread node-12 is follower, master is node-1-EventThread ----------------------------- node-1 shutdown ----------------------------- node-3-EventThread is leader node-12-EventThread is follower, master is node-3-EventThread node-11-EventThread is follower, master is node-3-EventThread node-5-EventThread is follower, master is node-3-EventThread node-7-EventThread is follower, master is node-3-EventThread node-9-EventThread is follower, master is node-3-EventThread node-2-EventThread is follower, master is node-3-EventThread node-10-EventThread is follower, master is node-3-EventThread node-8-EventThread is follower, master is node-3-EventThread node-13 is follower, master is node-3-EventThread ----------------------------- node-3 shutdown ----------------------------- node-5-EventThread is leader node-13-EventThread is follower, master is node-5-EventThread node-12-EventThread is follower, master is node-5-EventThread node-7-EventThread is follower, master is node-5-EventThread node-11-EventThread is follower, master is node-5-EventThread node-10-EventThread is follower, master is node-5-EventThread node-9-EventThread is follower, master is node-5-EventThread node-2-EventThread is follower, master is node-5-EventThread node-8-EventThread is follower, master is node-5-EventThread node-14 is follower, master is node-5-EventThread ----------------------------- node-5 shutdown ----------------------------- node-7-EventThread is leader node-13-EventThread is follower, master is node-7-EventThread node-12-EventThread is follower, master is node-7-EventThread node-9-EventThread is follower, master is node-7-EventThread node-11-EventThread is follower, master is node-7-EventThread node-14-EventThread is follower, master is node-7-EventThread node-10-EventThread is follower, master is node-7-EventThread node-8-EventThread is follower, master is node-7-EventThread node-2-EventThread is follower, master is node-7-EventThread node-15 is follower, master is node-7-EventThread ----------------------------- node-7 shutdown ----------------------------- node-14-EventThread is leader node-13-EventThread is follower, master is node-14-EventThread node-11-EventThread is follower, master is node-14-EventThread node-2-EventThread is follower, master is node-14-EventThread node-12-EventThread is follower, master is node-14-EventThread node-15-EventThread is follower, master is node-14-EventThread node-10-EventThread is follower, master is node-14-EventThread node-9-EventThread is follower, master is node-14-EventThread node-8-EventThread is follower, master is node-14-EventThread node-16 is follower, master is node-14-EventThread ----------------------------- node-14 shutdown ----------------------------- node-13-EventThread is leader node-12-EventThread is follower, master is node-13-EventThread node-15-EventThread is follower, master is node-13-EventThread node-9-EventThread is follower, master is node-13-EventThread node-10-EventThread is follower, master is node-13-EventThread node-2-EventThread is follower, master is node-13-EventThread node-8-EventThread is follower, master is node-13-EventThread node-11-EventThread is follower, master is node-13-EventThread node-16-EventThread is follower, master is node-13-EventThread node-17 is follower, master is node-13-EventThread ...
.