1. 程式人生 > >三分鐘看懂一致性雜湊演算法

三分鐘看懂一致性雜湊演算法

受一篇“五分鐘看懂”的啟發,來個譁眾取寵的標題

一致性雜湊演算法,作為分散式計算的資料分配參考,比傳統的取模,劃段都好很多。

在電信計費中,可以作為多臺訊息介面機和線上計費主機的分配演算法,根據session_id來分配,這樣當計費主機動態伸縮的時候,因為session_id快取缺失而需要放通的會話,會明顯減少。

傳統的取模方式

例如10條資料,3個節點,如果按照取模的方式,那就是

node a: 0,3,6,9

node b: 1,4,7

node c: 2,5,8

當增加一個節點的時候,資料分佈就變更為

node a:0,4,8

node b:1,5,9

node c: 2,6

node d: 3,7

總結:資料3,4,5,6,7,8,9在增加節點的時候,都需要做搬遷,成本太高

一致性雜湊方式

最關鍵的區別就是,對節點和資料,都做一次雜湊運算,然後比較節點和資料的雜湊值,資料取和節點最相近的節點做為存放節點。這樣就保證當節點增加或者減少的時候,影響的資料最少。

還是拿剛剛的例子,(用簡單的字串的ascii碼做雜湊key):

十條資料,算出各自的雜湊值

0:192

1:196

2:200

3:204

4:208

5:212

6:216

7:220

8:224

9:228

有三個節點,算出各自的雜湊值

node a: 203

node g: 209

node z: 228

這個時候比較兩者的雜湊值,如果大於228,就歸到前面的203,相當於整個雜湊值就是一個環,對應的對映結果:

node a: 0,1,2

node g: 3,4

node z: 5,6,7,8,9

這個時候加入node n, 就可以算出node n的雜湊值:

node n: 216

這個時候對應的資料就會做遷移:

node a: 0,1,2

node g: 3,4

node n: 5,6

node z: 7,8,9

這個時候只有5和6需要做遷移

另外,這個時候如果只算出三個雜湊值,那再跟資料的雜湊值比較的時候,很容易分得不均衡,因此就引入了虛擬節點的概念,通過把三個節點加上ID字尾等方式,每個節點算出n個雜湊值,均勻的放在雜湊環上,這樣對於資料算出的雜湊值,能夠比較雜湊的分佈(詳見下面程式碼中的replica)

通過這種演算法做資料分佈,在增減節點的時候,可以大大減少資料的遷移規模。

下面轉載的雜湊程式碼,已經將gen_key改成上述描述的用字串ascii相加的方式,便於測試驗證。

import md5
class HashRing(object):
    def __init__(self, nodes=None, replicas=3):
        """Manages a hash ring.
        `nodes` is a list of objects that have a proper __str__ representation.
        `replicas` indicates how many virtual points should be used pr. node,
        replicas are required to improve the distribution.
        """
        self.replicas = replicas
        self.ring = dict()
        self._sorted_keys = []
        if nodes:
            for node in nodes:
                self.add_node(node)
    def add_node(self, node):
        """Adds a `node` to the hash ring (including a number of replicas).
        """
        for i in xrange(0, self.replicas):
            key = self.gen_key('%s:%s' % (node, i))
            print "node %s-%s key is %ld" % (node, i, key)
            self.ring[key] = node
            self._sorted_keys.append(key)
        self._sorted_keys.sort()
    def remove_node(self, node):
        """Removes `node` from the hash ring and its replicas.
        """
        for i in xrange(0, self.replicas):
            key = self.gen_key('%s:%s' % (node, i))
            del self.ring[key]
            self._sorted_keys.remove(key)
    def get_node(self, string_key):
        """Given a string key a corresponding node in the hash ring is returned.
        If the hash ring is empty, `None` is returned.
        """
        return self.get_node_pos(string_key)[0]
    def get_node_pos(self, string_key):
        """Given a string key a corresponding node in the hash ring is returned
        along with it's position in the ring.
        If the hash ring is empty, (`None`, `None`) is returned.
        """
        if not self.ring:
            return None, None
        key = self.gen_key(string_key)
        nodes = self._sorted_keys
        for i in xrange(0, len(nodes)):
            node = nodes[i]
            if key <= node:
                print "string_key %s key %ld" % (string_key, key) 
                print "get node %s-%d " % (self.ring[node], i)
                return self.ring[node], i
        return self.ring[nodes[0]], 0
    def print_ring(self):
        if not self.ring:
            return None, None
        nodes = self._sorted_keys
        for i in xrange(0, len(nodes)):
            node = nodes[i]
            print "ring slot %d is node %s, hash vale is %s" % (i, self.ring[node], node)
    def get_nodes(self, string_key):
        """Given a string key it returns the nodes as a generator that can hold the key.
        The generator is never ending and iterates through the ring
        starting at the correct position.
        """
        if not self.ring:
            yield None, None
        node, pos = self.get_node_pos(string_key)
        for key in self._sorted_keys[pos:]:
            yield self.ring[key]
        while True:
            for key in self._sorted_keys:
                yield self.ring[key]
    def gen_key(self, key):
        """Given a string key it returns a long value,
        this long value represents a place on the hash ring.
        md5 is currently used because it mixes well.
        """
        m = md5.new()
        m.update(key)
        return long(m.hexdigest(), 16)
        """
        hash = 0
        for i in xrange(0, len(key)):
            hash += ord(key[i]) 
        return hash
        """


memcache_servers = ['a',
                   'g',
                    'z']
ring = HashRing(memcache_servers,1)
ring.print_ring()
server = ring.get_node('0000')
server = ring.get_node('1111')
server = ring.get_node('2222')
server = ring.get_node('3333')
server = ring.get_node('4444')
server = ring.get_node('5555')
server = ring.get_node('6666')
server = ring.get_node('7777')
server = ring.get_node('8888')
server = ring.get_node('9999')

print '----------------------------------------------------------'

memcache_servers = ['a',
                   'g',
                   'n',
                    'z']
ring = HashRing(memcache_servers,1)
ring.print_ring()
server = ring.get_node('0000')
server = ring.get_node('1111')
server = ring.get_node('2222')
server = ring.get_node('3333')
server = ring.get_node('4444')
server = ring.get_node('5555')
server = ring.get_node('6666')
server = ring.get_node('7777')
server = ring.get_node('8888')
server = ring.get_node('9999')