1. 程式人生 > >《Apache Zookeeper 官方文件》-3 快速指南:使用zookeeper來協調分散式應用

《Apache Zookeeper 官方文件》-3 快速指南:使用zookeeper來協調分散式應用

原文連結  譯者:softliumin  校對:方騰飛

本節內容讓你快速入門zookeeper。它主要針對想嘗試使用zookeeper的開發者,幷包含一個ZooKeeper單機伺服器的安裝說明,你可以用一些命令來驗證它的執行,以及簡單的程式設計例項。最後,為了考慮到方便性,有一些複雜的安裝部分,例如執行叢集式的部署安裝,優化事務日誌將不在本文件中說明。對於商業部署的完整說明,請參閱管理員指南

一:前提準備條件

二:下載

三:單機配置

在單機模式中配置一個ZooKeeper伺服器是非常簡單的。一個JAR檔案裡包含了這個服務,安裝只需要建立一個配置檔案。一旦你下載了一個穩定版的ZooKeeper,解壓它並用cd命令進入ZooKeeper的根目錄。

你需要配置一個檔案來啟動ZooKeeper,下面有個例子,建立一個檔案conf/zoo.cfg:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

可以用任何應用程式開啟conf/zoo.cfg。並且可以通過改變 dataDir的值來指定一個新的目錄。每個欄位的含義如下:

  • tickTime:是zookeeper的最小時間單元的長度(以毫秒為單位),它被用來設定心跳檢測和會話最小超時時間(tickTime的兩倍)
  • dataDir:用來配置伺服器儲存資料快照的目錄,除非特別配置說明,事務日誌也會被儲存到這個目錄。
  • clientPort:用來配置監聽客戶端的連線的埠。

當這些都配置好之後,就可以使用如下命令啟動zookeeper:

bin/zkServer.sh start

Zookeeper的日誌使用了log4j,更多細節資訊請檢視zookeeper程式指南中的Logging章節

可以從控制檯看到日誌資訊,或者從log4j的配置的日誌檔案中檢視日誌。 這一小節,主要講了如何啟動單機模式的zookeeper。在這裡沒有使用叢集的設定,一旦ZooKeeper 程序出現故障,服務就會終止,這對於大多數時候的開發環境是沒問題的,但想要執行以叢集的方式來執行ZooKeeper ,請檢視Running Replicated ZooKeeper

四:Zookeeper的
儲存管理

對於長時間執行的生產系統 ,ZooKeeper儲存必須要經常維護(dataDir和日誌)。如果想了解更多細節請看maintenance章節。

五:連線Zookeeper

$ bin/zkCli.sh -server 127.0.0.1:2181

一旦Zookeeper執行起來,你可以有多種方式來連線它,一旦你連線上Zookeeper你會看到下面這些資訊:

Connecting to localhost:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
Welcome to ZooKeeper! JLine support is enabled
[zkshell: 0]

從shell端你輸入help,獲得一個可以在客戶端執行的命令清單,如下所示:

[zkshell: 0] help
ZooKeeper host:port cmd args
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
create path data acl
stat path [watch]
listquota path
history
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
deleteall path
setquota -n|-b val path

從這裡,你可以嘗試一些簡單的命令來了解這個命令列介面。首先,開始通過發一行命令,如ls,

[zkshell: 8] ls /

[zookeeper]

Next, create a new znode by running create /zk_test my_data. This creates a new znode and associates the string “my_data” with the node. You should see:

接下來,通過執行create /zk_test my_data命令來建立一個新的節點。這將建立一個新的znode和關聯字串”my_data”節點。您應該看到

[zkshell: 9] create /zk_test my_data

Created /zk_test

發出另一個ls /命令的目錄是什麼樣子

[zkshell: 11] ls /

[zookeeper, zk_test]

現在注意到這個zk_test目錄已經被建立,接下來我們還通過set命令來改變zk_test的資料,如下所示:

zkshell: 14] set /zk_test junk

cZxid = 5

ctime = Fri Jun 05 13:57:06 PDT 2009

mZxid = 6

mtime = Fri Jun 05 14:01:52 PDT 2009

pZxid = 5

cversion = 0

dataVersion = 1

aclVersion = 0

ephemeralOwner = 0

dataLength = 4

numChildren = 0

[zkshell: 15] get /zk_test

junk

cZxid = 5

ctime = Fri Jun 05 13:57:06 PDT 2009

mZxid = 6

mtime = Fri Jun 05 14:01:52 PDT 2009

pZxid = 5

cversion = 0

dataVersion = 1

aclVersion = 0

ephemeralOwner = 0

dataLength = 4

numChildren = 0

(注意:我們可以在set命令執行之後,使用get來查正式資料是否已經改變) 最後讓我們刪除這個節點:

[zkshell: 16] delete /zk_test

[zkshell: 17] ls /

[zookeeper] [zkshell: 18]

六:程式設計

ZooKeeper 有C語言和java兩個版本 ,它們功能上是一樣的。C語言版本有2個不同點,單執行緒和多執行緒。這些差異僅僅在訊息迴圈時候體現出來。更多細節,請檢視Zookeeper程式設計指南中的的程式設計案例,演示了使用不同的API的樣例程式碼。

七:叢集模式執行

在單機模式下執行ZooKeeper主要用於學習,開發和測試。但是如果在產品中使用,你應該在叢集模式下執行ZooKeeper。同一個應用伺服器的一個叢集組我們稱為一個叢集。在叢集模式下, 在叢集下的所有伺服器可以複製同樣的配置。

注意: 在叢集模式中,至少需要三個伺服器,強烈推薦你使用奇數數量的伺服器。如果你僅僅只有兩臺伺服器,一旦一個伺服器掛了,你將會面臨一個局面,沒有足夠的機器組成叢集,兩臺伺服器本來就比一臺伺服器更加不穩定,因為會有兩個故障點。

叢集模式和單點模式一樣需要使用conf/zoo.cfg 檔案,但是有一些不同,這裡有一個例子:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader. The entry syncLimit limits how far out of date a server can be from a leader.

With both of these timeouts, you specify the unit of time using tickTime. In this example, the timeout for initLimit is 5 ticks at 2000 milleseconds a tick, or 10 seconds.

The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the filemyid in the data directory. That file has the contains the server number, in ASCII.

Finally, note the two port numbers after each server name: ” 2888″ and “3888”. Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.

Note

If you want to test multiple servers on a single machine, specify the servername as localhost with unique quorum & leader election ports (i.e. 2888:3888, 2889:3889, 2890:3890 in the example above) for each server.X in that server’s config file. Of course separate dataDirs and distinct clientPorts are also necessary (in the above replicated example, running on a single localhost, you would still have three config files).

Please be aware that setting up multiple servers on a single machine will not create any redundancy. If something were to happen which caused the machine to die, all of the zookeeper servers would be offline. Full redundancy requires that each server have its own machine. It must be a completely separate physical server. Multiple virtual machines on the same physical host are still vulnerable to the complete failure of that host.

八:其他優化

There are a couple of other configuration parameters that can greatly increase performance:

  • To get low latencies on updates it is important to have a dedicated transaction log directory. By default transaction logs are put in the same directory as the data snapshots and myid file. The dataLogDir parameters indicates a different directory to use for the transaction logs.
  • [tbd: what is the other config param?]