1. 程式人生 > >高可用性和PyMongo

高可用性和PyMongo

doc cluster only 通過 turn choosing trac query .com

High Availability and PyMongo
高可用性和PyMongo
************************************

PyMongo makes it easy to write highly available applications whether you use a single replica set or a large sharded cluster.
不論你使用一個簡單的副本集還是一個大型的分片集群,Pymongo都讓你能輕松的寫出高可用性的應用程序.

Connecting to a Replica Set
連接到一個副本集
============================


PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.
用PyMongo連接副本集很容易.我們將啟動一個新的副本集來展示如何用Pymongo初始化和連接它.

Note
Replica sets require server version >= 1.6.0. Support for connecting to replica sets also requires PyMongo version >= 1.8.0.
副本集要求服務器版本不低於1.6.0. 要連接到副本集,要求PyMongo版本不低於 1.8.0.

See general MongoDB documentation rs ( http://dochub.mongodb.org/core/rs )

Starting a Replica Set
啟動一個副本集
============================

The main replica set documentation contains extensive information about setting up a new replica set or migrating an existing MongoDB setup, be sure to check that out. Here, we’ll just do the bare minimum to get a three node replica set setup locally.
副本集的主文檔包含豐富的關於如何設置一個新的副本集或者從已經存在的mongo改裝安裝的信息,一定要看一下那個文檔.
這裏,我們只做最基本的,在本地建立一個3節點的副本集.

Warning
Replica sets should always use multiple nodes in production - putting all set members on the same physical node is only recommended for testing and development.
生產環境中,副本集應用總是使用多個節點 - 將所有副本集成員放到一個物理節點上的行為,建議只在測試和開發環境中存在.

We start three mongod processes, each on a different port and with a different dbpath, but all using the same replica set name “foo”. In the example we use the hostname “morton.local”, so replace that with your hostname when running:
我們起了3個mongod進程,分別使用不同的端口,不同的db路徑,它們使用同一個副本集名稱"foo". 在示例中我們使用的hostname為"morton.local", 自己實驗時別忘了改成你自己的hostname.

$ hostname
morton.local
$ mongod --replSet foo/morton.local:27018,morton.local:27019 --rest

$ mongod --port 27018 --dbpath /data/db1 --replSet foo/morton.local:27017 --rest

$ mongod --port 27019 --dbpath /data/db2 --replSet foo/morton.local:27017 --rest

Initializing the Set
初始化集合
============================

At this point all of our nodes are up and running, but the set has yet to be initialized. Until the set is initialized no node will become the primary, and things are essentially “offline”.
現在所有的節點都起來了, 但是集合還需要初始化.初始化之前,集合中將沒有主節點,本質上相當於offline.

To initialize the set we need to connect to a single node and run the initiate command. Since we don’t have a primary yet, we’ll need to tell PyMongo that it’s okay to connect to a slave/secondary:
我們需要連接到一個節點並且運行初始化命令來初始化副本集.由於我們現在還沒有主節點,我們需要告訴PyMongo連接到一個slave/secondary節點也無妨:

>>> from pymongo import MongoClient, ReadPreference
>>> c = MongoClient("morton.local:27017",
read_preference=ReadPreference.SECONDARY)

Note
We could have connected to any of the other nodes instead, but only the node we initiate from is allowed to contain any initial data.
我們可以連接任何一個節點去做集合的初始化,但是只有我們連的這臺機器才能包含初始化數據.(?)

After connecting, we run the initiate command to get things started (here we just use an implicit configuration, for more advanced configuration options see the replica set documentation):
連上一臺db server之後,我們運行初始化命令來使集合運行起來(我們這裏只用了一個顯式的配置,更多高級的配置選項,參見 副本集 的文檔):

>>> c.admin.command("replSetInitiate")
{u‘info‘: u‘Config now saved locally. Should come online in about a minute.‘,
u‘info2‘: u‘no configuration explicitly specified -- making one‘, u‘ok‘: 1.0}

The three mongod servers we started earlier will now coordinate and come online as a replica set.
我們之前啟動的三臺mongod server現在將一起合作並且作為一個副本集而online了.

Connecting to a Replica Set
連接到副本集
============================

The initial connection as made above is a special case for an uninitialized replica set. Normally we’ll want to connect differently. A connection to a replica set can be made using the normal MongoClient() constructor, specifying one or more members of the set. For example, any of the following will create a connection to the set we just created:
前面的初始化連接是一種專門用來連接未初始化的副本集的情況. 通常情況下,我們不這麽做(譯者註: 因為通常我們不需要自己在程序裏初始化副本集).
可以用一個普通的MongoClient()構造器通過制定一個或多個集合成員來連接到副本集. 例如,如下的方式都能連接到我們剛剛創建的副本集:
(這些方法可以連接未初始化的副本集嗎? 應該不行. ??)


>>> MongoClient("morton.local", replicaset=‘foo‘)
MongoClient([u‘morton.local:27019‘, ‘morton.local:27017‘, u‘morton.local:27018‘])
>>> MongoClient("morton.local:27018", replicaset=‘foo‘)
MongoClient([u‘morton.local:27019‘, u‘morton.local:27017‘, ‘morton.local:27018‘])
>>> MongoClient("morton.local", 27019, replicaset=‘foo‘)
MongoClient([‘morton.local:27019‘, u‘morton.local:27017‘, u‘morton.local:27018‘])
>>> MongoClient(["morton.local:27018", "morton.local:27019"])
MongoClient([‘morton.local:27019‘, u‘morton.local:27017‘, ‘morton.local:27018‘])
>>> MongoClient("mongodb://morton.local:27017,morton.local:27018,morton.local:27019")
MongoClient([‘morton.local:27019‘, ‘morton.local:27017‘, ‘morton.local:27018‘])

The nodes passed to MongoClient() are called the seeds. If only one host is specified the replicaset parameter must be used to indicate this isn’t a connection to a single node. As long as at least one of the seeds is online, the driver will be able to “discover” all of the nodes in the set and make a connection to the current primary.
傳遞給MongoClient()的節點被成為種子.如果只指定了一個host,那麽必須使用‘replicaset‘參數來指明不是要連接到一個單獨節點.
種子中要至少有一臺在線, driver才能"發現"副本集中所有的節點並且連接到當前的主節點.


Handling Failover
處理 failover
============================

When a failover occurs, PyMongo will automatically attempt to find the new primary node and perform subsequent operations on that node. This can’t happen completely transparently, however. Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to the replica set and perform a couple of basic operations:
當failover發生時, Pymongo會自動嘗試發現新的主節點並且在新的主節點上進行後續操作. 然而,這個過程並不是完全透明的. 我們將用一個示例failover來演示會發生什麽事情.
首先,我們連接到副本集並且做一些基本操作:

>>> db = MongoClient("morton.local", replicaSet=‘foo‘).test
>>> db.test.save({"x": 1})
ObjectId(‘...‘)
>>> db.test.find_one()
{u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}

By checking the host and port, we can see that we’re connected to morton.local:27017, which is the current primary:
通過檢查 host和port,我們可以看出我們當前連接到 morton.local:27017, 也就是當前的主節點:

>>> db.connection.host
‘morton.local‘
>>> db.connection.port
27017

Now let’s bring down that node and see what happens when we run our query again:
現在我們把這個節點放倒來看看我們再次運行查詢時會發生什麽:

>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...

We get an AutoReconnect exception. This means that the driver was not able to connect to the old primary (which makes sense, as we killed the server), but that it will attempt to automatically reconnect on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.
我們得到一個 AutoReconnect 異常.這意味著驅動連接不到老的主節點(這就對了,我們剛剛殺掉了這個server), 但是驅動會嘗試自動重連.
當這個異常被拋出時,我們的應用程序需要決定是重試操作還是直接繼續,接受剛才這個操作可能失敗了的事實.

On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed:
後面再次嘗試這個查詢時,我們還是有可能看到這個異常. 不過,最終,副本集會重新選出一個主節點(這個過程通常需要幾秒鐘). 到時候,驅動會連接到這個新的主節點,操作就會成功了.

>>> db.test.find_one()
{u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}
>>> db.connection.host
‘morton.local‘
>>> db.connection.port
27018


MongoReplicaSetClient
MongoReplicaSetClient
============================

Using a MongoReplicaSetClient instead of a simple MongoClient offers two key features: secondary reads and replica set health monitoring. To connect using MongoReplicaSetClient just provide a host:port pair and the name of the replica set:
使用MongoReplicaSetClient替代MongoClient提供兩個關鍵的特性: 讀從庫和副本集健康監控. 用MongoReplicaSetClient連接副本集只需要提供一個 host:port對和副本集名稱即可:

>>> from pymongo import MongoReplicaSetClient
>>> MongoReplicaSetClient("morton.local:27017", replicaSet=‘foo‘)
MongoReplicaSetClient([u‘morton.local:27019‘, u‘morton.local:27017‘, u‘morton.local:27018‘])

Secondary Reads
讀從庫
------------------

By default an instance of MongoReplicaSetClient will only send queries to the primary member of the replica set. To use secondaries for queries we have to change the ReadPreference:
默認情況下,MongoReplicaSetClient的實例只會將查詢發送到副本集的主節點. 為了使用讀從庫的功能我們需要修改ReadPreference.

>>> db = MongoReplicaSetClient("morton.local:27017", replicaSet=‘foo‘).test
>>> from pymongo.read_preferences import ReadPreference
>>> db.read_preference = ReadPreference.SECONDARY_PREFERRED

Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the SECONDARY read preference:
並非所有的查詢都會被發送到副本集的從庫. 如果沒有從庫,則查詢會回溯到主節點. 如果你有些查詢不希望發到主節點,你可以指定它使用 SECONDARY 讀:

>>> db.read_preference = ReadPreference.SECONDARY

Read preference can be set on a client, database, collection, or on a per-query basis, e.g.:
讀偏好 可以在client,database,collection或者單個查詢為基礎設定,例如:

>>> db.collection.find_one(read_preference=ReadPreference.PRIMARY)

Reads are configured using three options: read_preference, tag_sets, and secondary_acceptable_latency_ms.
有三個選項可以配置讀操作: read_preference, tag_sets 和 secondary_acceptable_latency_ms.

read_preference:
- - - - - - - - -

* PRIMARY:
Read from the primary. This is the default, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
從主節點讀. 這是默認行為, 而且提供了最強的一致性保障. 如果主節點不可用, 拋出 AutoReconnect 異常.
* PRIMARY_PREFERRED:
Read from the primary if available, or if there is none, read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms.
如果主節點可用則讀主節點, 如果不可用, 讀第二個符合你的 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點.
* SECONDARY:
Read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms. If no matching secondary is available, raise AutoReconnect.
讀第二個符合你的 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點. 如果不存在這樣的節點, 拋出 AutoReconnect 異常.
* SECONDARY_PREFERRED:
Read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms if available, otherwise from primary (regardless of the primary’s tags and latency).
讀第二個符合你的 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點. 如果不存在這樣的節點, 讀主節點(忽略主節點的tags和latency).
* NEAREST:
Read from any member matching your choice of tag_sets and secondary_acceptable_latency_ms.
從任意一個符合你 tag_sets 和 secondary_acceptable_latency_ms 選擇的節點.

tag_sets:
- - - - - -

Replica-set members can be tagged according to any criteria you choose. By default, MongoReplicaSetClient ignores tags when choosing a member to read from, but it can be configured with the tag_sets parameter. tag_sets must be a list of dictionaries, each dict providing tag values that the replica set member must match. MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member. For example, to prefer reads from the New York data center, but fall back to the San Francisco data center, tag your replica set members according to their location and create a MongoReplicaSetClient like so:
副本集成員可以根據你選擇的任何標準來打tag. 默認情況下, MongoReplicaSetClient 選擇讀節點時忽略tags, 但是這個行為可以通過tag_sets參數配置.
tag_sets 必須是一個字典的列表,每一個字典提供副本集成員需要滿足的tag 值. MongoReplicaSetClient 順序嘗試每一個tag集合,直到發現有至少一個匹配成員的tag集合.
例如, 要優先從New York數據中心讀數據, 其次從 San Francisco數據中心讀, 可以給你的副本集按照位置打tag,並且創建一個這樣的 MongoReplicaSetClient:

>>> rsc = MongoReplicaSetClient(
... "morton.local:27017",
... replicaSet=‘foo‘
... read_preference=ReadPreference.SECONDARY,
... tag_sets=[{‘dc‘: ‘ny‘}, {‘dc‘: ‘sf‘}]
... )

MongoReplicaSetClient tries to find secondaries in New York, then San Francisco, and raises AutoReconnect if none are available. As an additional fallback, specify a final, empty tag set, {}, which means “read from any member that matches the mode, ignoring tags.”
MongoReplicaSetClient 嘗試從NewYork尋找 secondaries, 然後嘗試從 San Francisco找, 如果一個匹配都沒有則拋出 AutoReconnect 異常.
作為一個附加的跌落方案, 指定一個最終的,空的tag集合, {}, 這意味著"從任何一個匹配mode的成員讀數據,忽略tags."

secondary_acceptable_latency_ms:
- - - - - - - - - - - - - - - - -

If multiple members match the mode and tag sets, MongoReplicaSetClient reads from among the nearest members, chosen according to ping time. By default, only members whose ping times are within 15 milliseconds of the nearest are used for queries. You can choose to distribute reads among members with higher latencies by setting secondary_acceptable_latency_ms to a larger number. In that case, MongoReplicaSetClient distributes reads among matching members within secondary_acceptable_latency_ms of the closest member’s ping time.
如果多個成員匹配mode 和 tag集合, MongoReplicaSetClient將從最近的成員那裏讀數據, 以ping耗時排列遠近. 默認情況下,只有ping延時比最近節點慢15毫秒以內的節點才會被查詢.
你可以通過將 secondary_acceptable_latency_ms 設置為一個大一點的數字來選擇延遲高一些成員進行查詢.
這種情況下, MongoReplicaSetClient 將查詢分發到延遲符合條件的成員中.

Note
secondary_acceptable_latency_ms is ignored when talking to a replica set through a mongos. The equivalent is the localThreshold command line option.
(??)



Health Monitoring
健康監控
------------------------

When MongoReplicaSetClient is initialized it launches a background task to monitor the replica set for changes in:
MongoReplicaSetClient初始化之後, 將啟動一個後臺進程來監控副本集的如下變化:

* Health: detect when a member goes down or comes up, or if a different member becomes primary
健康: 檢測成員的下線和上線, 或者主節點變更
* Configuration: detect changes in tags
配置: 檢測tags 的變更
* Latency: track a moving average of each member’s ping time
延遲: 跟蹤每個成員的平均ping耗時

Replica-set monitoring ensures queries are continually routed to the proper members as the state of the replica set changes.
副本集監控能確保副本集狀態發生變更時,查詢被持續的路由到合適的成員.

It is critical to call close() to terminate the monitoring task before your process exits.
程序結束前,調用 close()方法結束監控任務 是很重要的.


High Availability and mongos
高可用性和 mongos
============================

An instance of MongoClient can be configured to automatically connect to a different mongos if the instance it is currently connected to fails. If a failure occurs, PyMongo will attempt to find the nearest mongos to perform subsequent operations. As with a replica set this can’t happen completely transparently, Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to a sharded cluster, using a seed list, and perform a couple of basic operations:
MongoClient的實例可以配置成當前連接失敗時自動連接到另一個mongos. 當失敗發生時,PyMongo會嘗試找出最近的mongos來進行後續的操作.
需iyu副本集來說,這不會是完全透明的,我們來人造一個failover演示一下事情會怎樣.首先,我們連接到一個分片的集群,使用一個種子列表, 然後執行一些基本操作:

>>> db = MongoClient(‘morton.local:30000,morton.local:30001,morton.local:30002‘).test
>>> db.test.save({"x": 1})
ObjectId(‘...‘)
>>> db.test.find_one()
{u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}

Each member of the seed list passed to MongoClient must be a mongos. By checking the host, port, and is_mongos attributes we can see that we’re connected to morton.local:30001, a mongos:
傳遞給MongoClient的每一個種子列表都必須是一個mongos. 通過查看host,port和is_mongos屬性 我們可以看到我們現在連接到 morton.local:30001, 一個mongos:

>>> db.connection.host
‘morton.local‘
>>> db.connection.port
30001
>>> db.connection.is_mongos
True

Now let’s shut down that mongos instance and see what happens when we run our query again:
現在我們關閉這個mongos實例來看看當我們再次執行查詢時會發生什麽:

>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...

As in the replica set example earlier in this document, we get an AutoReconnect exception. This means that the driver was not able to connect to the original mongos at port 30001 (which makes sense, since we shut it down), but that it will attempt to connect to a new mongos on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.
就像前面的副本集示例一樣,我們得到了一個AutoReconnect異常.
這意味著驅動無法連接到最初的端口30001上的mongos了(這很正常,因為我們把它關了), 但是它會嘗試為後續操作連接一個新的mongos.
當這個異常被拋出時,我們的應用程序需要決定是重試操作還是直接繼續,接受剛才這個操作可能失敗了的事實.

As long as one of the seed list members is still available the next operation will succeed:
只要種子列表成員中還有一個成員可用,下一步操作就會成功:

>>> db.test.find_one()
{u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}
>>> db.connection.host
‘morton.local‘
>>> db.connection.port
30002
>>> db.connection.is_mongos
True

高可用性和PyMongo