redis原始碼分析與思考（三）——字典中鍵的兩種hash演算法

阿新 • • 發佈：2018-12-11

在Redis字典中，得到鍵的hash值顯得尤為重要，因為這個不僅關乎到是否字典能做到負載均衡，以及在效能上優勢是否突出，一個良好的hash演算法在此時就能發揮出巨大的作用。而一個良好的hash演算法往往傾向於把不同的例項分配在不同的雜湊值上。在Redis中，實現鍵的雜湊值有兩種演算法實現，一種是djb2演算法，另一種就是MurmurHash2演算法。

djb2演算法

djb2是Daniel J. Bernstein多年前在comp.lang.c上發表的雜湊演算法，這個演算法已被廣泛應用，是目前最好的字串雜湊演算法之一。因為它不僅計算速度很快，而且分佈比較均勻。而在Redis中的實現如下：

static uint32_t dict_hash_function_seed = 5381;
unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len) {
    unsigned int hash = (unsigned int)dict_hash_function_seed;
    while (len--)
        hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c   buf轉換成小寫*/
    return 
 hash;
}

MurmurHash2演算法

MurmurHash2演算法是由Austin Appleby於2008年發明，這種演算法的優點在於，即使給出的例項有著規律，但是演算法依舊可以給出一個不錯的隨機分佈，而且計算速度也很快。這也是Redis中採用計算鍵的雜湊值的演算法。給出演算法實現：

unsigned int dictGenHashFunction(const void *key, int len) {
    /* 'm' and 'r' are mixing constants generated offline.
     They're not really 'magic', they just happen to work well.  */ 

    uint32_t seed = dict_hash_function_seed;
    const uint32_t m = 0x5bd1e995;
    const int r = 24;
    /* Initialize the hash to a 'random' value */
    uint32_t h = seed ^ len;
    /* Mix 4 bytes at a time into the hash */
    const unsigned char *data = (const unsigned char *)key;
    while(len >= 4) {
        uint32_t k = *(uint32_t*)data;
        k *= m;
        k ^= k >> r;
        k *= m;
        h *= m;
        h ^= k;
        data += 4;
        len -= 4;
    }
    /* Handle the last few bytes of the input array  */
    switch(len) {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0]; h *= m;
    };
    /* Do a few final mixes of the hash to ensure the last few
     * bytes are well-incorporated. */
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;
    return (unsigned int)h;
}

而在Redis中採用的雜湊函式是對鍵的雜湊值與字典的大小的掩碼取與操作。這種做法使得取與後的值小於等於字典的大小的掩碼，防止了記憶體溢位。如下程式碼所示：

#include <iostream>
using std::string;
unsigned int djb2(string s)
{
    unsigned int hash=(unsigned)5381;
    for (int i = 0; i <s.length() ; ++i) {
        hash=((hash<<5)+hash)+tolower(s[i]);
    }
    return hash;
}
int main()
{
    unsigned int sizemask=9;
    string s[10];
    for (int i = 0; i <10 ; ++i) {
        std::cin>>s[i];
    }
    for (int j = 0; j <10 ; ++j) {
        std::cout<<(djb2(s[j]) & sizemask)<<std::endl;
    }
    return 0;
}

結果為：在這裡插入圖片描述可見結果確實如此。因對演算法理解有限，博主並不知道其中的演算法的原理具體如何，如有大神瀏覽，請告知一聲。下面有個在Stack Overflow上對djb2演算法的解惑，解釋了為什麼選擇5381。這裡有個答案

redis原始碼分析與思考（三）——字典中鍵的兩種hash演算法

djb2演算法

MurmurHash2演算法

redis原始碼分析與思考（三）——字典中鍵的兩種hash演算法

redis原始碼分析與思考（十三）——字串型別的命令實現(t_string.c)

redis原始碼分析與思考（八）——物件

redis原始碼分析與思考（一）——sds

redis原始碼分析與思考（十九）——AOF持久化

redis原始碼分析與思考（十八）——RDB持久化

redis原始碼分析與思考（十七）——有序集合型別的命令實現(t_zset.c)

redis原始碼分析與思考（十六）——集合型別的命令實現(t_set.c)

redis原始碼分析與思考（十五）——雜湊型別的命令實現(t_hash.c)

redis原始碼分析與思考（十四）——列表型別的命令實現(t_list.c)

redis原始碼分析與思考（十七）——有序集合型別的命令實現(t_set.c)

讀SDWebImage原始碼第三次的收穫與思考（三）

SQL註入漏洞的分析與利用（三）

spring cloud實戰與思考（三）微服務之間通過fiegn上傳一組文件（下）

python進階之資料分析與展示（三）

看透SpringMVC原始碼分析與實踐（一）

看透SpringMVC原始碼分析與實踐（二）

Nginx原始碼分析與實踐---（一）編寫一個簡單的Http模組

百度大腦人臉識別深度驗證與思考（三）之顏值識別

mysql慢查詢原因分析與解決（三）——索引及查詢優化

redis原始碼分析與思考（三）——字典中鍵的兩種hash演算法

djb2演算法

MurmurHash2演算法

相關推薦