1. 程式人生 > >Blockchain的魚和熊掌系列(7)Bloom Filter(續SPV)

Blockchain的魚和熊掌系列(7)Bloom Filter(續SPV)

Bloom Filter, a space-efficient randomized data structure, is mainly designed for many portable devices with limited storage space. One disadvantage of Bloom Filter is that it is hard to eliminate the probability of errors though it can be made sufficiently small and significantly useful enough for practical applications. It is worth noting that the time of computation, the size of look-up table as well as the probability of error are three fundamental metrics that trade off the performance of Bloom Filter.

Q1: Bloom Filter 是什麼?及其引數控制範圍?

這裡寫圖片描述

首先,不妨假設待對映的集合S中有n個元素,目標look-up table的表容量為m個二進位制位,雜湊函式的個數為k。接著,我們將k個雜湊函式依次雜湊S集合中的每一個元素,雜湊值於look-up table中不斷累記。然後,給定一個新的元素n4,讓look-up table判定n4是否是集合S中的元素?——只需將k個雜湊函式依次雜湊n4,驗證n4的k個雜湊值是否均正確地對映到look-up table中,所謂正確就是雜湊的結果不能有指向0的二進位制位。如果滿足,說明n4在指定集合,否則不在。這就是Bloom Filter的基本原理。
如果look-up table很小,集合很大,雜湊值衝突就容易出現:即Bloom Filter錯誤地告訴我們說元素n4存在集合S中。怎麼避免或者說盡量避免Bloom Filter犯錯?引數控制顯得相當重要!集合S通過k個雜湊函式全部對映完成之後,look-up table中某個二進位制位仍然為0的概率為:這裡寫圖片描述

由此,雜湊函式值衝突的概率相應地為:這裡寫圖片描述
顯然,雜湊函式值衝突的概率要控制在較小的,可以忽略的範圍內。保證look-up table中0的個數佔有相當比重(一旦look-up table中全為1就完全雜湊衝突了,表就失去意義了)。

歡迎關注“Aha實驗室”微信公眾號

參考
[1] Bloom B H. Space/time trade-offs in hash coding with allowable errors[J]. Communications of the ACM, 1970, 13(7): 422-426.
[2] Mitzenmacher M. Compressed bloom filters[J]. IEEE/ACM transactions on networking, 2002, 10(5): 604-612.