1. 程式人生 > >Quorum 機制在開源分散式系統中的應用

Quorum 機制在開源分散式系統中的應用

目錄

Introduction

A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed system.
A quorum-based technique is implemented to enforce consistent operation in a distributed system.

Quorum 即某個操作允許執行所要獲得的最小投票數,在分散式系統中,用於保證一致性操作。

Quorum-based voting in commit protocols

In a distributed database system, a transaction could be executing its operations at multiple sites.
Since atomicity requires every distributed transaction to be atomic, the transaction must have the same fate (commit or abort) at every site.
In case of network partitioning, sites are partitioned and the partitions may not be able to communicate with each other.

Every site in the system is assigned a vote Vi. Let us assume that the total number of votes in the system is V and the abort and commit quorums are Va and Vc, respectively.
Then the following rules must be obeyed in the implementation of the commit protocol:

  1. Va + Vc > V, where 0 < Vc, Va <= V

    .

  2. Before a transaction commits, it must obtain a commit quorum Vc.
    The total of at least one site that is prepared to commit and zero or more sites waiting >= Vc.

  3. Before a transaction aborts, it must obtain an abort quorum Va
    The total of zero or more sites that are prepared to abort or any sites waiting >= Va.

The first rule ensures that a transaction cannot be committed and aborted at the same time.
The next two rules indicate the votes that a transaction has to obtain before it can terminate one way or the other.

假設一個叢集有 5 臺 server, 某臺 server 擁有 1 票投票權來決定事務是 commit 還是 abort. 那麼,想要提交事務或終止事務都至少獲得 3 票。

References

Quorum 機制應用 - Zookeeper

Atomic broadcast and leader election use the notion of quorum to guarantee a consistent view of the system.
One example is acknowledging a leader proposal: the leader can only commit once it receives an acknowledgement from a quorum of servers.

Zookeeper 中原子性廣播和 leader 選舉應用了 quorum 機制。

Zookeeper 的 quorum 模式有如下特性:

  • 1 leader + n followers(還有一種 observer 角色,這裡不考慮)
  • 每個 server 都儲存一份資料副本
  • 讀請求(不改變狀態)直接本地處理,寫請求(改變狀態)統一轉發給 leader 處理,然後同步至 follower

Read operations

ZooKeeper servers process read requests (exists, getData, and getChildren) locally.

When a server receives, say, a getData request from a client, it reads its state and returns it to the client. Because it serves requests locally, ZooKeeper is pretty fast at serving read dominated workloads. We can add more servers to the ZooKeeper ensemble to serve more read requests, increasing overall throughput capacity.

[ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals. Requests, Transactions, and Identifiers]

Write operations

Upon receiving a write request, a follower forwards it to the leader. The leader executes the request speculatively and broadcasts the result of the execution as a state update, in the form of a transaction. A transaction comprises the exact set of changes that a server must apply to the data tree when the transaction is committed.

How a server determines that a transaction has been committed. This follows a protocol called Zab: the ZooKeeper Atomic Broadcast protocol.

[ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals. Zab: Broadcasting State Updates]

Zookeeper transaction

A transaction is treated as a unit, in the sense that all changes it contains must be applied atomically.
When a ZooKeeper ensemble applies transactions, it makes sure that all changes are applied atomically and there is no interference from other transactions. There is no rollback mechanism like with traditional relational databases.
When the leader generates a new transaction, it assigns to the transaction an identifier that we call a ZooKeeper transaction ID (zxid).

[ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals. Requests, Transactions, and Identifiers]

問題 - 是否會讀到不一致的資料(多個客戶端是否會有不同的檢視)?

答案是 YES. 因為寫操作只在 quorum 個 server 成功之後返回,這樣剩餘的 server 可能還沒來得及更新資料,這也是讀操作高效能的代價。如果一定要保證讀到最新的資料,客戶端可以呼叫 sync 之後再讀。由於並不是所有應用都對資料及時性有要求,因此,Zookeeper 並沒有再內部 sync.

One drawback of using fast reads is not guaranteeing precedence order for read operations. That is, a read operation may return a stale value, even though a more recent update to the same znode has been committed.
To guarantee that a given read operation returns the latest updated value, a client calls sync followed by the read operation.
ZooKeeper: Wait-free coordination for Internet-scale systems

Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is:
Simultaneously Conistent Cross-Client Views
ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read.
ZooKeeper Programmer’s Guide. Consistency Guarantees

Leader Election and Atomic Broadcast

這兩部分是 Zookeeper 的核心,具體可以參考一下資料,這裡不再贅述:

Quorum 機制應用 - Redis Sentinel