1. 程式人生 > >比特幣原始碼情景分析之bloom filter精讀

比特幣原始碼情景分析之bloom filter精讀

上一篇SPV錢包裡utxos同步提到了bloom filter,這一章節我們將從原始碼分析角度來個深度解剖Bloom filter基本原理 An example of a Bloom filter, representing the set {xyz}. The colored arrows show the positions in the bit array that each set element is mapped to. The element w is not in the set {xyz}, because it hashes to one bit-array position containing 0. For this figure, m = 18 and k = 3.
      下面的bitarray是一個m位的位陣列。      filter的key集合是{x,y,z}, x新增到filter時,演算法會將x進行三次不同的hash而生成3個值,將這個值當做bitarray的index,並將對應index的內容置位1, 藍色的3個箭頭代表x的3個index,w驗證時同樣經過3次hash, 最後會生成3個index,然後從bitarray中查詢這3個index的內容,如果都為1,則證明存在,有一個不為1,說明不存在.    hash演算法的特點是,相同輸入產生固定的輸出(index),不同的輸入可能會得到相同的輸出(index), 所以bloom filter能完全確定不屬於集合的Key值,但是可能會錯誤的將不屬於集合的key值認為是屬於集合的。
    為了降低錯誤率,其實就是要降低不同key值再幾次hash產生相同輸出的概率。Bitmap的長度我們定為m,幾次hash我們定義為k. m增大能降低一次hash不同輸入產生相同輸出的概率,k增大能降低所有hash都相同的概率。所以合適的m和k值對降低錯誤率很關鍵.具體怎麼選取m, k值有相關的數學公式,大家可以參閱Bitcoin bloom filter流程1)load filter     (net_processing.cpp)
    else if (strCommand == NetMsgType::FILTERLOAD)    {        CBloomFilter filter;
        vRecv >> filter;        if (!filter.IsWithinSizeConstraints())        {            // There is no excuse for sending a too-large filter            LOCK(cs_main);            Misbehaving(pfrom->GetId(), 100);        }        else        {            LOCK(pfrom->cs_filter);            pfrom->pfilter.reset(new CBloomFilter(filter));            pfrom->pfilter->UpdateEmptyFull();            pfrom->fRelayTxes = true;        }    }filter的資料序列化    template <typename Stream, typename Operation>    inline void SerializationOp(Stream& s, Operation ser_action) {        //vData是bloom filter的集合key        READWRITE(vData);        //幾次hash函式        READWRITE(nHashFuncs);        READWRITE(nTweak);        READWRITE(nFlags);    }2)新增filter    else if (strCommand == NetMsgType::FILTERADD)    {        std::vector<unsigned char> vData;        vRecv >> vData;        // Nodes must NEVER send a data item > 520 bytes (the max size for a script data object,        // and thus, the maximum size any matched object can have) in a filteradd message        bool bad = false;        if (vData.size() > MAX_SCRIPT_ELEMENT_SIZE) {            bad = true;        } else {            LOCK(pfrom->cs_filter);            if (pfrom->pfilter) {                pfrom->pfilter->insert(vData);            } else {                bad = true;            }        }        if (bad) {            LOCK(cs_main);            Misbehaving(pfrom->GetId(), 100);        }    }其實就是按照bloom filter的演算法對新增的key做幾次hash然後修改bitArrayvoid CBloomFilter::insert(const std::vector<unsigned char>& vKey){    if (isFull)        return;    //n次不同hash,不代表需要n個不同的hash函式,直接根據index更改hash seed即可實現    for (unsigned int i = 0; i < nHashFuncs; i++)    {        unsigned int nIndex = Hash(i, vKey);        // Sets bit nIndex of vData        vData[nIndex >> 3] |= (1 << (7 & nIndex));    }    isEmpty = false;}上面的 vData[nIndex >> 3] |= (1 << (7 & nIndex)); 每一次key hash生成的結果對應到bitArray的1bit的index, 而vData是char物件,總共有4 bit,所以nIndex >> 3先找到對一個char的index, 1 << (7 & nIndex) 找到index對應4位中的哪一位class CBloomFilter{private:    std::vector<unsigned char> vData;    unsigned int nHashFuncs;    unsigned int nTweak;}nHashFuncs是int, 說好的不同的hash函式呢?inline unsigned int CBloomFilter::Hash(unsigned int nHashNum, const std::vector<unsigned char>& vDataToHash) const{    // 0xFBA4C795 chosen as it guarantees a reasonable bit difference between nHashNum values.    return MurmurHash3(nHashNum * 0xFBA4C795 + nTweak, vDataToHash) % (vData.size() * 8);}從這裡可以看出,n個不同的hash函式,其實確實可以通過n個不同int即可實現,這裡直接通過‘nHashNum * 0xFBA4C795 + nTweak’就達到了不同hash的效果3)filter應用場景我們以FILTERED_BLOCK訊息為例,該訊息的意思是獲取指定blockhash中滿足bloom filter的block 內容        else if (inv.type == MSG_FILTERED_BLOCK)        {            bool sendMerkleBlock = false;            CMerkleBlock merkleBlock;            {                LOCK(pfrom->cs_filter);                if (pfrom->pfilter) {                    sendMerkleBlock = true;                    //merkleBlock只包含包頭,符合條件的娥txhash及partial merklepath                    //是一種被過濾掉的block content                    merkleBlock = CMerkleBlock(*pblock, *pfrom->pfilter);                }            }            if (sendMerkleBlock) {                //返回merkleBlock                connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::MERKLEBLOCK, merkleBlock));                // CMerkleBlock just contains hashes, so also push any transactions in the block the client did not see                // This avoids hurting performance by pointlessly requiring a round-trip                // Note that there is currently no way for a node to request any single transactions we didn't send here -                // they must either disconnect and retry or request the full block.                // Thus, the protocol spec specified allows for us to provide duplicate txn here,                // however we MUST always provide at least what the remote peer needs                typedef std::pair<unsigned int, uint256> PairType;                for (PairType& pair : merkleBlock.vMatchedTxn)                    //返回符合filter條件的transaction 資料                    connman->PushMessage(pfrom, msgMaker.Make(SERIALIZE_TRANSACTION_NO_WITNESS, NetMsgType::TX, *pblock->vtx[pair.first]));            }            // else                // no response        }}filter具體過濾過程CMerkleBlock::CMerkleBlock(const CBlock& block, CBloomFilter* filter, const std::set<uint256>* txids){    header = block.GetBlockHeader();    std::vector<bool> vMatch;    std::vector<uint256> vHashes;    vMatch.reserve(block.vtx.size());    vHashes.reserve(block.vtx.size());    for (unsigned int i = 0; i < block.vtx.size(); i++)    {        const uint256& hash = block.vtx[i]->GetHash();        if (txids && txids->count(hash)) {            vMatch.push_back(true);        } else if (filter && filter->IsRelevantAndUpdate(*block.vtx[i])) {            vMatch.push_back(true);            vMatchedTxn.emplace_back(i, hash);        } else {            vMatch.push_back(false);        }        vHashes.push_back(hash);    }    txn = CPartialMerkleTree(vHashes, vMatch);}bool CBloomFilter::IsRelevantAndUpdate(const CTransaction& tx){    bool fFound = false;    // Match if the filter contains the hash of tx    //  for finding tx when they appear in a block    if (isFull)        return true;    if (isEmpty)        return false;    //獲取txhash,看是否在bloom filter集合中const uint256& hash = tx.GetHash();    if (contains(hash))        fFound = true;    for (unsigned int i = 0; i < tx.vout.size(); i++)    {        const CTxOut& txout = tx.vout[i];        // Match if the filter contains any arbitrary script data element in any scriptPubKey in tx        // If this matches, also add the specific output that was matched.        // This means clients don't have to update the filter themselves when a new relevant tx         // is discovered in order to find spending transactions, which avoids round-tripping and race conditions.        CScript::const_iterator pc = txout.scriptPubKey.begin();        std::vector<unsigned char> data;        while (pc < txout.scriptPubKey.end())        {            opcodetype opcode;            //獲取鎖定指令碼中的資料,以用於驗證這些資料是否在bloom filter集合中            if (!txout.scriptPubKey.GetOp(pc, opcode, data))                break;//驗證是否在在bloom filter集合中            if (data.size() != 0 && contains(data))            {                fFound = true;                break;            }        }    }    if (fFound)        return true;    for (const CTxIn& txin : tx.vin)    {        // Match if the filter contains an outpoint tx spends        //txin.prevout是否在bloom filter集合中        if (contains(txin.prevout))            return true;        // Match if the filter contains any arbitrary script data element in any scriptSig in tx        CScript::const_iterator pc = txin.scriptSig.begin();        std::vector<unsigned char> data;        while (pc < txin.scriptSig.end())        {            opcodetype opcode;            //獲取解鎖指令碼以驗證是否在在bloom filter集合中if (!txin.scriptSig.GetOp(pc, opcode, data))                break;            //驗證是否在在bloom filter集合中            if (data.size() != 0 && contains(data))                return true;        }    }    return false;}bool CBloomFilter::contains(const std::vector<unsigned char>& vKey) const{    for (unsigned int i = 0; i < nHashFuncs; i++)    {        unsigned int nIndex = Hash(i, vKey);        // Checks bit nIndex of vData        if (!(vData[nIndex >> 3] & (1 << (7 & nIndex))))            return false;    }    return true;}總結,用來filter的資料可以是tx.hash,也可以是txout.scriptPubKey中的data,也可以是txin.scriptSig中的data
比如根據交易的publicKey來過濾交易,就可以在transaction的txin, txout的上做文章.想P2PK的解鎖和鎖定指令碼中都有pubKey,可以用來filter./********************************* 本文來自CSDN博主"愛踢門"******************************************/