以太坊原始碼分析(20)core-bloombits原始碼分析
阿新 • • 發佈:2019-02-15
# scheduler.go
scheduler是基於section的布隆過濾器的單個bit值檢索的排程。 除了排程檢索操作之外,這個結構還可以對請求進行重複資料刪除並快取結果,從而即使在複雜的過濾情況下也可以將網路/資料庫開銷降至最低。
### 資料結構request表示一個bloom檢索任務,以便優先從本地資料庫中或從網路中剪檢索。 section 表示區塊段號,每段4096個區塊, bit代表檢索的是布隆過濾器的哪一位(一共有2048位)。這個在之前的(eth-bloombits和filter原始碼分析.md)中有介紹。 // request represents a bloom retrieval task to prioritize and pull from the local // database or remotely from the network. type request struct { section uint64 // Section index to retrieve the a bit-vector from bit uint // Bit index within the section to retrieve the vector of }
response當前排程的請求的狀態。 沒傳送一個請求,會生成一個response物件來最終這個請求的狀態。cached用來快取這個section的結果。
// response represents the state of a requested bit-vector through a scheduler. type response struct { cached []byte // Cached bits to dedup multiple requests done chan struct{} // Channel to allow waiting for completion }
scheduler
// scheduler handles the scheduling of bloom-filter retrieval operations for // entire section-batches belonging to a single bloom bit. Beside scheduling the // retrieval operations, this struct also deduplicates the requests and caches // the results to minimize network/database overhead even in complex filtering // scenarios. type scheduler struct { bit uint // Index of the bit in the bloom filter this scheduler is responsible for 布隆過濾器的哪一個bit位(0-2047) responses map[uint64]*response // Currently pending retrieval requests or already cached responses 當前正在進行的請求或者是已經快取的結果。 lock sync.Mutex // Lock protecting the responses from concurrent access }
### 建構函式 newScheduler和reset方法
// newScheduler creates a new bloom-filter retrieval scheduler for a specific // bit index. func newScheduler(idx uint) *scheduler { return &scheduler{ bit: idx, responses: make(map[uint64]*response), } } // reset cleans up any leftovers from previous runs. This is required before a // restart to ensure the no previously requested but never delivered state will // cause a lockup. reset用法用來清理之前的所有任何請求。 func (s *scheduler) reset() { s.lock.Lock() defer s.lock.Unlock() for section, res := range s.responses { if res.cached == nil { delete(s.responses, section) } } }
### 執行 run方法 run方法建立了一個流水線, 從sections channel來接收需要請求的sections,通過done channel來按照請求的順序返回結果。 併發的運行同樣的scheduler是可以的,這樣會導致任務重複。
// run creates a retrieval pipeline, receiving section indexes from sections and // returning the results in the same order through the done channel. Concurrent // runs of the same scheduler are allowed, leading to retrieval task deduplication. func (s *scheduler) run(sections chan uint64, dist chan *request, done chan []byte, quit chan struct{}, wg *sync.WaitGroup) { // sections 通道型別 這個是用來傳遞需要檢索的section的通道,輸入引數 // dist 通道型別, 屬於輸出通道(可能是網路傳送或者是本地檢索),往這個通道上傳送請求, 然後在done上獲取迴應。 // done 用來傳遞檢索結果的通道, 可以理解為返回值通道。 // Create a forwarder channel between requests and responses of the same size as // the distribution channel (since that will block the pipeline anyway). 在請求和響應之間建立一個與分發通道大小相同的轉發器通道(因為這樣會阻塞管道) pend := make(chan uint64, cap(dist)) // Start the pipeline schedulers to forward between user -> distributor -> user wg.Add(2) go s.scheduleRequests(sections, dist, pend, quit, wg) go s.scheduleDeliveries(pend, done, quit, wg) }
### scheduler的流程圖
![image](picture/chainindexer_2.png)圖中橢圓代表了goroutine. 矩形代表了channel. 三角形代表外部的方法呼叫。
1. scheduleRequests goroutine從sections接收到section訊息2. scheduleRequests把接收到的section組裝成requtest傳送到dist channel,並構建物件response[section]3. scheduleRequests把上一部的section傳送給pend佇列。scheduleDelivers接收到pend訊息,阻塞在response[section].done上面4. 外部呼叫deliver方法,把seciton的request請求結果寫入response[section].cached.並關閉response[section].done channel5. scheduleDelivers接收到response[section].done 資訊。 把response[section].cached 傳送到done channel
### scheduleRequests // scheduleRequests reads section retrieval requests from the input channel, // deduplicates the stream and pushes unique retrieval tasks into the distribution // channel for a database or network layer to honour. func (s *scheduler) scheduleRequests(reqs chan uint64, dist chan *request, pend chan uint64, quit chan struct{}, wg *sync.WaitGroup) { // Clean up the goroutine and pipeline when done defer wg.Done() defer close(pend) // Keep reading and scheduling section requests for { select { case <-quit: return case section, ok := <-reqs: // New section retrieval requested if !ok { return } // Deduplicate retrieval requests unique := false s.lock.Lock() if s.responses[section] == nil { s.responses[section] = &response{ done: make(chan struct{}), } unique = true } s.lock.Unlock() // Schedule the section for retrieval and notify the deliverer to expect this section if unique { select { case <-quit: return case dist <- &request{bit: s.bit, section: section}: } } select { case <-quit: return case pend <- section: } } } }
## generator.gogenerator用來產生基於section的布隆過濾器索引資料的物件。 generator內部主要的資料結構是 bloom[2048][4096]bit 的資料結構。 輸入是4096個header.logBloom資料。 比如第20個header的logBloom儲存在 bloom[0:2048][20]
資料結構:
// Generator takes a number of bloom filters and generates the rotated bloom bits // to be used for batched filtering. type Generator struct { blooms [types.BloomBitLength][]byte // Rotated blooms for per-bit matching sections uint // Number of sections to batch together //一個section包含的區塊頭的數量。 預設是4096 nextBit uint // Next bit to set when adding a bloom 當增加一個bloom的時候,需要設定哪個bit位置 }
構造:
// NewGenerator creates a rotated bloom generator that can iteratively fill a // batched bloom filter's bits. // func NewGenerator(sections uint) (*Generator, error) { if sections%8 != 0 { return nil, errors.New("section count not multiple of 8") } b := &Generator{sections: sections} for i := 0; i < types.BloomBitLength; i++ { //BloomBitLength=2048 b.blooms[i] = make([]byte, sections/8) // 除以8是因為一個byte是8個bit } return b, nil }
AddBloom增加一個區塊頭的logsBloom
// AddBloom takes a single bloom filter and sets the corresponding bit column // in memory accordingly. func (b *Generator) AddBloom(index uint, bloom types.Bloom) error { // Make sure we're not adding more bloom filters than our capacity if b.nextBit >= b.sections { //超過了section的最大數量 return errSectionOutOfBounds } if b.nextBit != index { //index是bloom在section中的下標 return errors.New("bloom filter with unexpected index") } // Rotate the bloom and insert into our collection byteIndex := b.nextBit / 8 // 查詢到對應的byte,需要設定這個byte位置 bitMask := byte(1) << byte(7-b.nextBit%8) // 找到需要設定值的bit在byte的下標 for i := 0; i < types.BloomBitLength; i++ { bloomByteIndex := types.BloomByteLength - 1 - i/8 bloomBitMask := byte(1) << byte(i%8) if (bloom[bloomByteIndex] & bloomBitMask) != 0 { b.blooms[i][byteIndex] |= bitMask } } b.nextBit++ return nil }
Bitset返回
// Bitset returns the bit vector belonging to the given bit index after all // blooms have been added. // 在所有的Blooms被新增之後,Bitset返回屬於給定位索引的資料。 func (b *Generator) Bitset(idx uint) ([]byte, error) { if b.nextBit != b.sections { return nil, errors.New("bloom not fully generated yet") } if idx >= b.sections { return nil, errSectionOutOfBounds } return b.blooms[idx], nil }
## matcher.goMatcher是一個流水線系統的排程器和邏輯匹配器,它們對位元流執行二進位制與/或操作,建立一個潛在塊的流來檢查資料內容。
資料結構 // partialMatches with a non-nil vector represents a section in which some sub- // matchers have already found potential matches. Subsequent sub-matchers will // binary AND their matches with this vector. If vector is nil, it represents a // section to be processed by the first sub-matcher. // partialMatches代表了部分匹配的結果。 比入有三個需要過濾的條件 addr1, addr2, addr3 ,需要找到同時匹配這三個條件的資料。 那麼我們啟動包含了匹配這三個條件的流水線。 // 第一個匹配的結果會送給第二個,第二個把第一個的結果和自己的結果執行bit與操作,然後作為匹配的結果送給第三個處理。 type partialMatches struct { section uint64 bitset []byte } // Retrieval represents a request for retrieval task assignments for a given // bit with the given number of fetch elements, or a response for such a request. // It can also have the actual results set to be used as a delivery data struct. // Retrieval 代表了 一次區塊布隆過濾器索引的檢索工作, 這個物件被髮送給 eth/bloombits.go 裡面的 startBloomHandlers來處理, 這個方法從資料庫來載入布隆過濾器索引然後放在Bitsets裡面返回。 type Retrieval struct { Bit uint Sections []uint64 Bitsets [][]byte } // Matcher is a pipelined system of schedulers and logic matchers which perform // binary AND/OR operations on the bit-streams, creating a stream of potential // blocks to inspect for data content. type Matcher struct { sectionSize uint64 // Size of the data batches to filter on filters [][]bloomIndexes // Filter the system is matching for schedulers map[uint]*scheduler // Retrieval schedulers for loading bloom bits retrievers chan chan uint // Retriever processes waiting for bit allocations 用來傳遞 檢索任務的通道 counters chan chan uint // Retriever processes waiting for task count reports 用來返回當前所有的任務數量 retrievals chan chan *Retrieval // Retriever processes waiting for task allocations 用來傳遞 檢索任務的分配 deliveries chan *Retrieval // Retriever processes waiting for task response deliveries 檢索完成的結果傳遞到這個通道 running uint32 // Atomic flag whether a session is live or not }
matcher的大體流程圖片,途中橢圓代表goroutine. 矩形代表channel。 三角形代表方法呼叫。
![image](picture/matcher_1.png)
1. 首先Matcher根據傳入的filter的個數 建立了對應個數的 subMatch 。 每一個subMatch對應了一個filter物件。 每一個subMatch會把自己的查詢結果和上一個查詢結果按照位與的方式得到新的結果。 如果新的結果所有的bit位都有置位,就會把這個查詢結果傳遞給下一個。 這是實現對所有的filter的結果求與的短路演算法。 如果前面的計算已經不能匹配任何東西,那麼就不用進行下面的條件的匹配了。2. Matcher會根據fiters的布隆過濾器的組合下標的個數來啟動對應個數的schedule。3. subMatch會把請求傳送給對應的schedule。4. schedule會把請求排程後通過dist傳送給distributor, 在distributor中管理起來。5. 會啟動多個(16)Multiplex執行緒,從distributor中獲取請求,然後把請求傳送給bloomRequests佇列, startBloomHandlers會訪問資料庫,拿到資料然後返回給Multiplex。6. Multiplex通過deliveries通道把回答告訴distributor。7. distributor呼叫schedule的deliver方法,把結果傳送給schedule8. schedule把結果返回給subMatch。9. subMatch把結果進行計算後傳送給下一個subMatch進行處理。如果是最後一個subMatch,那麼結果會進行處理後傳送給results通道。
matcher
filter := New(backend, 0, -1, []common.Address{addr}, [][]common.Hash{{hash1, hash2, hash3, hash4}}) 組間是與的關係 組內是或的關係。 (addr && hash1) ||(addr && hash2)||(addr && hash3)||(addr && hash4)
建構函式, 需要特別注意的是輸入的filters這個引數。 這個引數是一個三維度的陣列 [][]bloomIndexes === [第一維度][第二維度][3] 。
// 這個是filter.go裡面的程式碼,對於理解filters這個引數比較有用。 filter.go是Matcher的呼叫者。 // 可以看到無論有多少個addresses,在filters裡面也只佔一個位置。 filters[0]=addresses // filters[1] = topics[0] = 多個topic // filters[2] = topics[1] = 多個topic // filters[n] = topics[n] = 多個topic
// filter 的引數addresses 和 topics 的過濾演算法是, (含有addresses中任意一個address) 並且 (含有topics[0]裡面的任意一個topic) 並且 (含有topics[1]裡面任意一個topic) 並且 (含有topics[n]裡面的任意一個topic)
// 可以看到 對於filter 實行的是 對第一維的資料 執行 與操作, 對於第二維度的資料, 執行或操作。 // 而在NewMatcher方法中,把第三維的具體資料轉換成 布隆過濾器的指定三個位置。 所以在filter.go裡面的var filters [][][]byte 在Matcher裡面的filters變成了 [][][3] func New(backend Backend, begin, end int64, addresses []common.Address, topics [][]common.Hash) *Filter { // Flatten the address and topic filter clauses into a single bloombits filter // system. Since the bloombits are not positional, nil topics are permitted, // which get flattened into a nil byte slice. var filters [][][]byte if len(addresses) > 0 { filter := make([][]byte, len(addresses)) for i, address := range addresses { filter[i] = address.Bytes() } filters = append(filters, filter) } for _, topicList := range topics { filter := make([][]byte, len(topicList)) for i, topic := range topicList { filter[i] = topic.Bytes() } filters = append(filters, filter) }
// NewMatcher creates a new pipeline for retrieving bloom bit streams and doing // address and topic filtering on them. Setting a filter component to `nil` is // allowed and will result in that filter rule being skipped (OR 0x11...1). func NewMatcher(sectionSize uint64, filters [][][]byte) *Matcher { // Create the matcher instance m := &Matcher{ sectionSize: sectionSize, schedulers: make(map[uint]*scheduler), retrievers: make(chan chan uint), counters: make(chan chan uint), retrievals: make(chan chan *Retrieval), deliveries: make(chan *Retrieval), } // Calculate the bloom bit indexes for the groups we're interested in m.filters = nil for _, filter := range filters { // Gather the bit indexes of the filter rule, special casing the nil filter if len(filter) == 0 { continue } bloomBits := make([]bloomIndexes, len(filter)) for i, clause := range filter { if clause == nil { bloomBits = nil break } // clause 對應了輸入的第三維度的資料,可能是一個address或者是一個topic // calcBloomIndexes計算了這個資料對應的(0-2048)的布隆過濾器中的三個下標, 就是說如果在布隆過濾器中對應的三位都為1,那麼clause這個資料就有可能在這裡。 bloomBits[i] = calcBloomIndexes(clause) } // Accumulate the filter rules if no nil rule was within // 在計算中 如果bloomBits中只要其中的一條能夠找到。那麼就認為整個成立。 if bloomBits != nil { // 不同的bloomBits 需要同時成立,整個結果才能成立。 m.filters = append(m.filters, bloomBits) } } // For every bit, create a scheduler to load/download the bit vectors for _, bloomIndexLists := range m.filters { for _, bloomIndexList := range bloomIndexLists { for _, bloomIndex := range bloomIndexList { // 對於所有可能出現的下標。 我們都生成一個scheduler來進行對應位置的 // 布隆過濾資料的檢索。 m.addScheduler(bloomIndex) } } } return m }
Start 啟動
// Start starts the matching process and returns a stream of bloom matches in // a given range of blocks. If there are no more matches in the range, the result // channel is closed. func (m *Matcher) Start(begin, end uint64, results chan uint64) (*MatcherSession, error) { // Make sure we're not creating concurrent sessions if atomic.SwapUint32(&m.running, 1) == 1 { return nil, errors.New("matcher already running") } defer atomic.StoreUint32(&m.running, 0) // Initiate a new matching round // 啟動了一個session,作為返回值,管理查詢的生命週期。 session := &MatcherSession{ matcher: m, quit: make(chan struct{}), kill: make(chan struct{}), } for _, scheduler := range m.schedulers { scheduler.reset() } // 這個執行會建立起流程,返回了一個partialMatches型別的管道表示查詢的部分結果。 sink := m.run(begin, end, cap(results), session) // Read the output from the result sink and deliver to the user session.pend.Add(1) go func() { defer session.pend.Done() defer close(results) for { select { case <-session.quit: return case res, ok := <-sink: // New match result found // 找到返回結果 因為返回值是 section和 section中哪些區塊可能有值的bitmap // 所以需要遍歷這個bitmap,找到那些被置位的區塊,把區塊號返回回去。 if !ok { return } // Calculate the first and last blocks of the section sectionStart := res.section * m.sectionSize first := sectionStart if begin > first { first = begin } last := sectionStart + m.sectionSize - 1 if end < last { last = end } // Iterate over all the blocks in the section and return the matching ones for i := first; i <= last; i++ { // Skip the entire byte if no matches are found inside next := res.bitset[(i-sectionStart)/8] if next == 0 { i += 7 continue } // Some bit it set, do the actual submatching if bit := 7 - i%8; next&(1<<bit) != 0 { select { case <-session.quit: return case results <- i: } } } } } }() return session, nil }
run方法
// run creates a daisy-chain of sub-matchers, one for the address set and one // for each topic set, each sub-matcher receiving a section only if the previous // ones have all found a potential match in one of the blocks of the section, // then binary AND-ing its own matches and forwaring the result to the next one. // 建立一個子匹配器的流水線,一個用於地址集,一個用於每個主題集,每個子匹配器只有在先前的所有子塊都在該部分的一個塊中找到可能的匹配時才接收一個部分,然後把接收到的和自己的匹配,並將結果轉交給下一個。 // The method starts feeding the section indexes into the first sub-matcher on a // new goroutine and returns a sink channel receiving the results.
該方法開始section indexer送到第一個子匹配器,並返回接收結果的接收器通道。 func (m *Matcher) run(begin, end uint64, buffer int, session *MatcherSession) chan *partialMatches { // Create the source channel and feed section indexes into source := make(chan *partialMatches, buffer) session.pend.Add(1) go func() { defer session.pend.Done() defer close(source) for i := begin / m.sectionSize; i <= end/m.sectionSize; i++ { // 這個for迴圈 構造了subMatch的第一個輸入源,剩下的subMatch把上一個的結果作為自己的源 // 這個源的bitset欄位都是0xff,代表完全的匹配,它將和我們這一步的匹配進行與操作,得到這一步匹配的結果。 select { case <-session.quit: return case source <- &partialMatches{i, bytes.Repeat([]byte{0xff}, int(m.sectionSize/8))}: } } }() // Assemble the daisy-chained filtering pipeline next := source dist := make(chan *request, buffer) for _, bloom := range m.filters { //構建流水線, 前一個的輸出作為下一個subMatch的輸入。 next = m.subMatch(next, dist, bloom, session) } // Start the request distribution session.pend.Add(1) // 啟動distributor執行緒。 go m.distributor(dist, session) return next }
subMatch函式
// subMatch creates a sub-matcher that filters for a set of addresses or topics, binary OR-s those matches, then // binary AND-s the result to the daisy-chain input (source) and forwards it to the daisy-chain output. // The matches of each address/topic are calculated by fetching the given sections of the three bloom bit indexes belonging to // that address/topic, and binary AND-ing those vectors together. // subMatch建立一個子匹配器,用於過濾一組地址或主題,對這些主題進行bit位或操作,然後將上一個結果與當前過濾結果進行位與操作,如果結果不全位空,就把結果傳遞給下一個子匹配器。 每個地址/題目的匹配是通過獲取屬於該地址/題目的三個布隆過濾器位索引的給定部分以及將這些向量二進位制AND並在一起來計算的。 subMatch是最重要的一個函式, 把filters [][][3]的 第一維度的與,第二維度的或, 第三維度的與操作 結合在一起。
func (m *Matcher) subMatch(source chan *partialMatches, dist chan *request, bloom []bloomIndexes, session *MatcherSession) chan *partialMatches { // Start the concurrent schedulers for each bit required by the bloom filter // 傳入的bloom []bloomIndexes引數是filters的第二,第三維度 [][3]
sectionSources := make([][3]chan uint64, len(bloom)) sectionSinks := make([][3]chan []byte, len(bloom)) for i, bits := range bloom { // i代表了第二維度的數量 for j, bit := range bits { //j 代表了布隆過濾器的下標 肯定只有三個 取值(0-2048) sectionSources[i][j] = make(chan uint64, cap(source)) // 建立scheduler的輸入channel sectionSinks[i][j] = make(chan []byte, cap(source)) // 建立 scheduler的輸出channel // 對這個bit發起排程請求, 通過sectionSources[i][j]傳遞需要查詢的section // 通過sectionSinks[i][j]來接收結果 // dist 是scheduler傳遞請求的通道。 這個在scheduler的介紹裡面有。 m.schedulers[bit].run(sectionSources[i][j], dist, sectionSinks[i][j], session.quit, &session.pend) } } process := make(chan *partialMatches, cap(source)) // entries from source are forwarded here after fetches have been initiated 中間channel results := make(chan *partialMatches, cap(source)) // 返回值channel session.pend.Add(2) go func() { // Tear down the goroutine and terminate all source channels defer session.pend.Done() defer close(process) defer func() { for _, bloomSources := range sectionSources { for _, bitSource := range bloomSources { close(bitSource) } } }() // Read sections from the source channel and multiplex into all bit-schedulers // 從source channel讀取sections 並把這些資料通過sectionSources傳遞給scheduler for { select { case <-session.quit: return case subres, ok := <-source: // New subresult from previous link if !ok { return } // Multiplex the section index to all bit-schedulers for _, bloomSources := range sectionSources { for _, bitSource := range bloomSources { // 傳遞給上面的所有的scheduler的輸入通道。 申請對這些 // section 的指定bit進行查詢。 結果會發送給sectionSinks[i][j] select { case <-session.quit: return case bitSource <- subres.section: } } } // Notify the processor that this section will become available select { case <-session.quit: return case process <- subres: //等到所有的請求都遞交給scheduler 給process傳送訊息。 } } } }() go func() { // Tear down the goroutine and terminate the final sink channel defer session.pend.Done() defer close(results) // Read the source notifications and collect the delivered results for { select { case <-session.quit: return case subres, ok := <-process: // 這裡有個問題。 有沒有可能亂序。 因為通道都是有快取的。 可能查詢得快慢導致 // 查看了scheduler的實現, scheduler是保證順序的。怎麼進來,就會怎麼出去。 // Notified of a section being retrieved if !ok { return } // Gather all the sub-results and merge them together var orVector []byte for _, bloomSinks := range sectionSinks { var andVector []byte for _, bitSink := range bloomSinks { // 這裡可以接收到三個值 每個代表了對應下標的 布隆過濾器的值,對這三個值進行與操作, 就可以得到那些區塊可能存在對應的值。 var data []byte select { case <-session.quit: return case data = <-bitSink: } if andVector == nil { andVector = make([]byte, int(m.sectionSize/8)) copy(andVector, data) } else { bitutil.ANDBytes(andVector, andVector, data) } } if orVector == nil { 對第一維度的資料執行 Or操作。 orVector = andVector } else { bitutil.ORBytes(orVector, orVector, andVector) } } if orVector == nil { //可能通道被關閉了。 沒有查詢到任何值 orVector = make([]byte, int(m.sectionSize/8)) } if subres.bitset != nil { // 和輸入的上一次的結果進行與操作。 記得最開始這個值被初始化為全1 bitutil.ANDBytes(orVector, orVector, subres.bitset) } if bitutil.TestBytes(orVector) { // 如果不全為0 那麼新增到結果。可能會給下一個匹配。或者是返回。 select { case <-session.quit: return case results <- &partialMatches{subres.section, orVector}: } } } } }() return results }
distributor,接受來自scheduler的請求,並把他們放到一個set裡面。 然後把這些任務指派給retrievers來填充他們。 // distributor receives requests from the schedulers and queues them into a set // of pending requests, which are assigned to retrievers wanting to fulfil them. func (m *Matcher) distributor(dist chan *request, session *MatcherSession) { defer session.pend.Done() var ( requests = make(map[uint][]uint64) // Per-bit list of section requests, ordered by section number unallocs = make(map[uint]struct{}) // Bits with pending requests but not allocated to any retriever retrievers chan chan uint // Waiting retrievers (toggled to nil if unallocs is empty) ) var ( allocs int // Number of active allocations to handle graceful shutdown requests shutdown = session.quit // Shutdown request channel, will gracefully wait for pending requests ) // assign is a helper method fo try to assign a pending bit an an actively // listening servicer, or schedule it up for later when one arrives. assign := func(bit uint) { select { case fetcher := <-m.retrievers: allocs++ fetcher <- bit default: // No retrievers active, start listening for new ones retrievers = m.retrievers unallocs[bit] = struct{}{} } } for { select { case <-shutdown: // Graceful shutdown requested, wait until all pending requests are honoured if allocs == 0 { return } shutdown = nil case <-session.kill: // Pending requests not honoured in time, hard terminate return case req := <-dist: // scheduler傳送來的請求 新增到指定bit位置的queue裡面 // New retrieval request arrived to be distributed to some fetcher process queue := requests[req.bit] index := sort.Search(len(queue), func(i int) bool { return queue[i] >= req.section }) requests[req.bit] = append(queue[:index], append([]uint64{req.section}, queue[index:]...)...) // If it's a new bit and we have waiting fetchers, allocate to them // 如果這個bit是一個新的。 還沒有被指派,那麼我們把他指派給等待的fetchers if len(queue) == 0 { assign(req.bit) } case fetcher := <-retrievers: // New retriever arrived, find the lowest section-ed bit to assign // 如果新的retrievers進來了, 那麼我們檢視是否有任務沒有指派 bit, best := uint(0), uint64(math.MaxUint64) for idx := range unallocs { if requests[idx][0] < best { bit, best = idx, requests[idx][0] } } // Stop tracking this bit (and alloc notifications if no more work is available) delete(unallocs, bit) if len(unallocs) == 0 { //如果所有任務都被指派。那麼停止關注retrievers retrievers = nil } allocs++ fetcher <- bit case fetcher := <-m.counters: // New task count request arrives, return number of items // 來了新的請求,訪問request的指定bit的數量。 fetcher <- uint(len(requests[<-fetcher])) case fetcher := <-m.retrievals: // New fetcher waiting for tasks to retrieve, assign // 有人來領取任務。 task := <-fetcher if want := len(task.Sections); want >= len(requests[task.Bit]) { task.Sections = requests[task.Bit] delete(requests, task.Bit) } else { task.Sections = append(task.Sections[:0], requests[task.Bit][:want]...) requests[task.Bit] = append(requests[task.Bit][:0], requests[task.Bit][want:]...) } fetcher <- task // If anything was left unallocated, try to assign to someone else // 如果還有任務沒有分派完。 嘗試分配給其他人。 if len(requests[task.Bit]) > 0 { assign(task.Bit) } case result := <-m.deliveries: // New retrieval task response from fetcher, split out missing sections and // deliver complete ones // 收到了任務的結果。 var ( sections = make([]uint64, 0, len(result.Sections)) bitsets = make([][]byte, 0, len(result.Bitsets)) missing = make([]uint64, 0, len(result.Sections)) ) for i, bitset := range result.Bitsets { if len(bitset) == 0 { //如果任務結果有缺失,記錄下來 missing = append(missing, result.Sections[i]) continue } sections = append(sections, result.Sections[i]) bitsets = append(bitsets, bitset) } // 投遞結果 m.schedulers[result.Bit].deliver(sections, bitsets) allocs-- // Reschedule missing sections and allocate bit if newly available if len(missing) > 0 { //如果有缺失, 那麼重新生成新的任務。 queue := requests[result.Bit] for _, section := range missing { index := sort.Search(len(queue), func(i int) bool { return queue[i] >= section }) queue = append(queue[:index], append([]uint64{section}, queue[index:]...)...) } requests[result.Bit] = queue if len(queue) == len(missing) { assign(result.Bit) } } // If we're in the process of shutting down, terminate if allocs == 0 && shutdown == nil { return } } } }
任務領取AllocateRetrieval。 任務領取了一個任務。 會返回指定的bit的檢索任務。
// AllocateRetrieval assigns a bloom bit index to a client process that can either // immediately reuest and fetch the section contents assigned to this bit or wait // a little while for more sections to be requested. func (s *MatcherSession) AllocateRetrieval() (uint, bool) { fetcher := make(chan uint) select { case <-s.quit: return 0, false case s.matcher.retrievers <- fetcher: bit, ok := <-fetcher return bit, ok } }
AllocateSections,領取指定bit的section查詢任務。
// AllocateSections assigns all or part of an already allocated bit-task queue // to the requesting process. func (s *MatcherSession) AllocateSections(bit uint, count int) []uint64 { fetcher := make(chan *Retrieval) select { case <-s.quit: return nil case s.matcher.retrievals <- fetcher: task := &Retrieval{ Bit: bit, Sections: make([]uint64, count), } fetcher <- task return (<-fetcher).Sections } }
DeliverSections,把結果投遞給deliveries 通道。
// DeliverSections delivers a batch of section bit-vectors for a specific bloom // bit index to be injected into the processing pipeline. func (s *MatcherSession) DeliverSections(bit uint, sections []uint64, bitsets [][]byte) { select { case <-s.kill: return case s.matcher.deliveries <- &Retrieval{Bit: bit, Sections: sections, Bitsets: bitsets}: } }
任務的執行Multiplex,Multiplex函式不斷的領取任務,把任務投遞給bloomRequest佇列。從佇列獲取結果。然後投遞給distributor。 完成了整個過程。
// Multiplex polls the matcher session for rerieval tasks and multiplexes it into // the reuested retrieval queue to be serviced together with other sessions. // // This method will block for the lifetime of the session. Even after termination // of the session, any request in-flight need to be responded to! Empty responses // are fine though in that case. func (s *MatcherSession) Multiplex(batch int, wait time.Duration, mux chan chan *Retrieval) { for { // Allocate a new bloom bit index to retrieve data for, stopping when done bit, ok := s.AllocateRetrieval() if !ok { return } // Bit allocated, throttle a bit if we're below our batch limit if s.PendingSections(bit) < batch { select { case <-s.quit: // Session terminating, we can't meaningfully service, abort s.AllocateSections(bit, 0) s.DeliverSections(bit, []uint64{}, [][]byte{}) return case <-time.After(wait): // Throttling up, fetch whatever's available } } // Allocate as much as we can handle and request servicing sections := s.AllocateSections(bit, batch) request := make(chan *Retrieval) select { case <-s.quit: // Session terminating, we can't meaningfully service, abort s.DeliverSections(bit, sections, make([][]byte, len(sections))) return case mux <- request: // Retrieval accepted, something must arrive before we're aborting request <- &Retrieval{Bit: bit, Sections: sections} result := <-request s.DeliverSections(result.Bit, result.Sections, result.Bitsets) } } }
scheduler是基於section的布隆過濾器的單個bit值檢索的排程。 除了排程檢索操作之外,這個結構還可以對請求進行重複資料刪除並快取結果,從而即使在複雜的過濾情況下也可以將網路/資料庫開銷降至最低。
### 資料結構request表示一個bloom檢索任務,以便優先從本地資料庫中或從網路中剪檢索。 section 表示區塊段號,每段4096個區塊, bit代表檢索的是布隆過濾器的哪一位(一共有2048位)。這個在之前的(eth-bloombits和filter原始碼分析.md)中有介紹。 // request represents a bloom retrieval task to prioritize and pull from the local // database or remotely from the network. type request struct { section uint64 // Section index to retrieve the a bit-vector from bit uint // Bit index within the section to retrieve the vector of }
response當前排程的請求的狀態。 沒傳送一個請求,會生成一個response物件來最終這個請求的狀態。cached用來快取這個section的結果。
// response represents the state of a requested bit-vector through a scheduler. type response struct { cached []byte // Cached bits to dedup multiple requests done chan struct{} // Channel to allow waiting for completion }
scheduler
// scheduler handles the scheduling of bloom-filter retrieval operations for // entire section-batches belonging to a single bloom bit. Beside scheduling the // retrieval operations, this struct also deduplicates the requests and caches // the results to minimize network/database overhead even in complex filtering // scenarios. type scheduler struct { bit uint // Index of the bit in the bloom filter this scheduler is responsible for 布隆過濾器的哪一個bit位(0-2047) responses map[uint64]*response // Currently pending retrieval requests or already cached responses 當前正在進行的請求或者是已經快取的結果。 lock sync.Mutex // Lock protecting the responses from concurrent access }
### 建構函式
// newScheduler creates a new bloom-filter retrieval scheduler for a specific // bit index. func newScheduler(idx uint) *scheduler { return &scheduler{ bit: idx, responses: make(map[uint64]*response), } } // reset cleans up any leftovers from previous runs. This is required before a // restart to ensure the no previously requested but never delivered state will // cause a lockup. reset用法用來清理之前的所有任何請求。 func (s *scheduler) reset() { s.lock.Lock() defer s.lock.Unlock() for section, res := range s.responses { if res.cached == nil { delete(s.responses, section) } } }
### 執行 run方法
// run creates a retrieval pipeline, receiving section indexes from sections and // returning the results in the same order through the done channel. Concurrent // runs of the same scheduler are allowed, leading to retrieval task deduplication. func (s *scheduler) run(sections chan uint64, dist chan *request, done chan []byte, quit chan struct{}, wg *sync.WaitGroup) { // sections 通道型別 這個是用來傳遞需要檢索的section的通道,輸入引數 // dist 通道型別, 屬於輸出通道(可能是網路傳送或者是本地檢索),往這個通道上傳送請求, 然後在done上獲取迴應。 // done 用來傳遞檢索結果的通道, 可以理解為返回值通道。 // Create a forwarder channel between requests and responses of the same size as // the distribution channel (since that will block the pipeline anyway). 在請求和響應之間建立一個與分發通道大小相同的轉發器通道(因為這樣會阻塞管道) pend := make(chan uint64, cap(dist)) // Start the pipeline schedulers to forward between user -> distributor -> user wg.Add(2) go s.scheduleRequests(sections, dist, pend, quit, wg) go s.scheduleDeliveries(pend, done, quit, wg) }
### scheduler的流程圖
![image](picture/chainindexer_2.png)圖中橢圓代表了goroutine. 矩形代表了channel. 三角形代表外部的方法呼叫。
1. scheduleRequests goroutine從sections接收到section訊息2. scheduleRequests把接收到的section組裝成requtest傳送到dist channel,並構建物件response[section]3. scheduleRequests把上一部的section傳送給pend佇列。scheduleDelivers接收到pend訊息,阻塞在response[section].done上面4. 外部呼叫deliver方法,把seciton的request請求結果寫入response[section].cached.並關閉response[section].done channel5. scheduleDelivers接收到response[section].done 資訊。 把response[section].cached 傳送到done channel
### scheduleRequests // scheduleRequests reads section retrieval requests from the input channel, // deduplicates the stream and pushes unique retrieval tasks into the distribution // channel for a database or network layer to honour. func (s *scheduler) scheduleRequests(reqs chan uint64, dist chan *request, pend chan uint64, quit chan struct{}, wg *sync.WaitGroup) { // Clean up the goroutine and pipeline when done defer wg.Done() defer close(pend) // Keep reading and scheduling section requests for { select { case <-quit: return case section, ok := <-reqs: // New section retrieval requested if !ok { return } // Deduplicate retrieval requests unique := false s.lock.Lock() if s.responses[section] == nil { s.responses[section] = &response{ done: make(chan struct{}), } unique = true } s.lock.Unlock() // Schedule the section for retrieval and notify the deliverer to expect this section if unique { select { case <-quit: return case dist <- &request{bit: s.bit, section: section}: } } select { case <-quit: return case pend <- section: } } } }
## generator.gogenerator用來產生基於section的布隆過濾器索引資料的物件。 generator內部主要的資料結構是 bloom[2048][4096]bit 的資料結構。 輸入是4096個header.logBloom資料。 比如第20個header的logBloom儲存在 bloom[0:2048][20]
資料結構:
// Generator takes a number of bloom filters and generates the rotated bloom bits // to be used for batched filtering. type Generator struct { blooms [types.BloomBitLength][]byte // Rotated blooms for per-bit matching sections uint // Number of sections to batch together //一個section包含的區塊頭的數量。 預設是4096 nextBit uint // Next bit to set when adding a bloom 當增加一個bloom的時候,需要設定哪個bit位置 }
構造:
// NewGenerator creates a rotated bloom generator that can iteratively fill a // batched bloom filter's bits. // func NewGenerator(sections uint) (*Generator, error) { if sections%8 != 0 { return nil, errors.New("section count not multiple of 8") } b := &Generator{sections: sections} for i := 0; i < types.BloomBitLength; i++ { //BloomBitLength=2048 b.blooms[i] = make([]byte, sections/8) // 除以8是因為一個byte是8個bit } return b, nil }
AddBloom增加一個區塊頭的logsBloom
// AddBloom takes a single bloom filter and sets the corresponding bit column // in memory accordingly. func (b *Generator) AddBloom(index uint, bloom types.Bloom) error { // Make sure we're not adding more bloom filters than our capacity if b.nextBit >= b.sections { //超過了section的最大數量 return errSectionOutOfBounds } if b.nextBit != index { //index是bloom在section中的下標 return errors.New("bloom filter with unexpected index") } // Rotate the bloom and insert into our collection byteIndex := b.nextBit / 8 // 查詢到對應的byte,需要設定這個byte位置 bitMask := byte(1) << byte(7-b.nextBit%8) // 找到需要設定值的bit在byte的下標 for i := 0; i < types.BloomBitLength; i++ { bloomByteIndex := types.BloomByteLength - 1 - i/8 bloomBitMask := byte(1) << byte(i%8) if (bloom[bloomByteIndex] & bloomBitMask) != 0 { b.blooms[i][byteIndex] |= bitMask } } b.nextBit++ return nil }
Bitset返回
// Bitset returns the bit vector belonging to the given bit index after all // blooms have been added. // 在所有的Blooms被新增之後,Bitset返回屬於給定位索引的資料。 func (b *Generator) Bitset(idx uint) ([]byte, error) { if b.nextBit != b.sections { return nil, errors.New("bloom not fully generated yet") } if idx >= b.sections { return nil, errSectionOutOfBounds } return b.blooms[idx], nil }
## matcher.goMatcher是一個流水線系統的排程器和邏輯匹配器,它們對位元流執行二進位制與/或操作,建立一個潛在塊的流來檢查資料內容。
資料結構 // partialMatches with a non-nil vector represents a section in which some sub- // matchers have already found potential matches. Subsequent sub-matchers will // binary AND their matches with this vector. If vector is nil, it represents a // section to be processed by the first sub-matcher. // partialMatches代表了部分匹配的結果。 比入有三個需要過濾的條件 addr1, addr2, addr3 ,需要找到同時匹配這三個條件的資料。 那麼我們啟動包含了匹配這三個條件的流水線。 // 第一個匹配的結果會送給第二個,第二個把第一個的結果和自己的結果執行bit與操作,然後作為匹配的結果送給第三個處理。 type partialMatches struct { section uint64 bitset []byte } // Retrieval represents a request for retrieval task assignments for a given // bit with the given number of fetch elements, or a response for such a request. // It can also have the actual results set to be used as a delivery data struct. // Retrieval 代表了 一次區塊布隆過濾器索引的檢索工作, 這個物件被髮送給 eth/bloombits.go 裡面的 startBloomHandlers來處理, 這個方法從資料庫來載入布隆過濾器索引然後放在Bitsets裡面返回。 type Retrieval struct { Bit uint Sections []uint64 Bitsets [][]byte } // Matcher is a pipelined system of schedulers and logic matchers which perform // binary AND/OR operations on the bit-streams, creating a stream of potential // blocks to inspect for data content. type Matcher struct { sectionSize uint64 // Size of the data batches to filter on filters [][]bloomIndexes // Filter the system is matching for schedulers map[uint]*scheduler // Retrieval schedulers for loading bloom bits retrievers chan chan uint // Retriever processes waiting for bit allocations 用來傳遞 檢索任務的通道 counters chan chan uint // Retriever processes waiting for task count reports 用來返回當前所有的任務數量 retrievals chan chan *Retrieval // Retriever processes waiting for task allocations 用來傳遞 檢索任務的分配 deliveries chan *Retrieval // Retriever processes waiting for task response deliveries 檢索完成的結果傳遞到這個通道 running uint32 // Atomic flag whether a session is live or not }
matcher的大體流程圖片,途中橢圓代表goroutine. 矩形代表channel。 三角形代表方法呼叫。
![image](picture/matcher_1.png)
1. 首先Matcher根據傳入的filter的個數 建立了對應個數的 subMatch 。 每一個subMatch對應了一個filter物件。 每一個subMatch會把自己的查詢結果和上一個查詢結果按照位與的方式得到新的結果。 如果新的結果所有的bit位都有置位,就會把這個查詢結果傳遞給下一個。 這是實現對所有的filter的結果求與的短路演算法。 如果前面的計算已經不能匹配任何東西,那麼就不用進行下面的條件的匹配了。2. Matcher會根據fiters的布隆過濾器的組合下標的個數來啟動對應個數的schedule。3. subMatch會把請求傳送給對應的schedule。4. schedule會把請求排程後通過dist傳送給distributor, 在distributor中管理起來。5. 會啟動多個(16)Multiplex執行緒,從distributor中獲取請求,然後把請求傳送給bloomRequests佇列, startBloomHandlers會訪問資料庫,拿到資料然後返回給Multiplex。6. Multiplex通過deliveries通道把回答告訴distributor。7. distributor呼叫schedule的deliver方法,把結果傳送給schedule8. schedule把結果返回給subMatch。9. subMatch把結果進行計算後傳送給下一個subMatch進行處理。如果是最後一個subMatch,那麼結果會進行處理後傳送給results通道。
matcher
filter := New(backend, 0, -1, []common.Address{addr}, [][]common.Hash{{hash1, hash2, hash3, hash4}}) 組間是與的關係 組內是或的關係。 (addr && hash1) ||(addr && hash2)||(addr && hash3)||(addr && hash4)
建構函式, 需要特別注意的是輸入的filters這個引數。 這個引數是一個三維度的陣列 [][]bloomIndexes === [第一維度][第二維度][3] 。
// 這個是filter.go裡面的程式碼,對於理解filters這個引數比較有用。 filter.go是Matcher的呼叫者。 // 可以看到無論有多少個addresses,在filters裡面也只佔一個位置。 filters[0]=addresses // filters[1] = topics[0] = 多個topic // filters[2] = topics[1] = 多個topic // filters[n] = topics[n] = 多個topic
// filter 的引數addresses 和 topics 的過濾演算法是, (含有addresses中任意一個address) 並且 (含有topics[0]裡面的任意一個topic) 並且 (含有topics[1]裡面任意一個topic) 並且 (含有topics[n]裡面的任意一個topic)
// 可以看到 對於filter 實行的是 對第一維的資料 執行 與操作, 對於第二維度的資料, 執行或操作。 // 而在NewMatcher方法中,把第三維的具體資料轉換成 布隆過濾器的指定三個位置。 所以在filter.go裡面的var filters [][][]byte 在Matcher裡面的filters變成了 [][][3] func New(backend Backend, begin, end int64, addresses []common.Address, topics [][]common.Hash) *Filter { // Flatten the address and topic filter clauses into a single bloombits filter // system. Since the bloombits are not positional, nil topics are permitted, // which get flattened into a nil byte slice. var filters [][][]byte if len(addresses) > 0 { filter := make([][]byte, len(addresses)) for i, address := range addresses { filter[i] = address.Bytes() } filters = append(filters, filter) } for _, topicList := range topics { filter := make([][]byte, len(topicList)) for i, topic := range topicList { filter[i] = topic.Bytes() } filters = append(filters, filter) }
// NewMatcher creates a new pipeline for retrieving bloom bit streams and doing // address and topic filtering on them. Setting a filter component to `nil` is // allowed and will result in that filter rule being skipped (OR 0x11...1). func NewMatcher(sectionSize uint64, filters [][][]byte) *Matcher { // Create the matcher instance m := &Matcher{ sectionSize: sectionSize, schedulers: make(map[uint]*scheduler), retrievers: make(chan chan uint), counters: make(chan chan uint), retrievals: make(chan chan *Retrieval), deliveries: make(chan *Retrieval), } // Calculate the bloom bit indexes for the groups we're interested in m.filters = nil for _, filter := range filters { // Gather the bit indexes of the filter rule, special casing the nil filter if len(filter) == 0 { continue } bloomBits := make([]bloomIndexes, len(filter)) for i, clause := range filter { if clause == nil { bloomBits = nil break } // clause 對應了輸入的第三維度的資料,可能是一個address或者是一個topic // calcBloomIndexes計算了這個資料對應的(0-2048)的布隆過濾器中的三個下標, 就是說如果在布隆過濾器中對應的三位都為1,那麼clause這個資料就有可能在這裡。 bloomBits[i] = calcBloomIndexes(clause) } // Accumulate the filter rules if no nil rule was within // 在計算中 如果bloomBits中只要其中的一條能夠找到。那麼就認為整個成立。 if bloomBits != nil { // 不同的bloomBits 需要同時成立,整個結果才能成立。 m.filters = append(m.filters, bloomBits) } } // For every bit, create a scheduler to load/download the bit vectors for _, bloomIndexLists := range m.filters { for _, bloomIndexList := range bloomIndexLists { for _, bloomIndex := range bloomIndexList { // 對於所有可能出現的下標。 我們都生成一個scheduler來進行對應位置的 // 布隆過濾資料的檢索。 m.addScheduler(bloomIndex) } } } return m }
Start 啟動
// Start starts the matching process and returns a stream of bloom matches in // a given range of blocks. If there are no more matches in the range, the result // channel is closed. func (m *Matcher) Start(begin, end uint64, results chan uint64) (*MatcherSession, error) { // Make sure we're not creating concurrent sessions if atomic.SwapUint32(&m.running, 1) == 1 { return nil, errors.New("matcher already running") } defer atomic.StoreUint32(&m.running, 0) // Initiate a new matching round // 啟動了一個session,作為返回值,管理查詢的生命週期。 session := &MatcherSession{ matcher: m, quit: make(chan struct{}), kill: make(chan struct{}), } for _, scheduler := range m.schedulers { scheduler.reset() } // 這個執行會建立起流程,返回了一個partialMatches型別的管道表示查詢的部分結果。 sink := m.run(begin, end, cap(results), session) // Read the output from the result sink and deliver to the user session.pend.Add(1) go func() { defer session.pend.Done() defer close(results) for { select { case <-session.quit: return case res, ok := <-sink: // New match result found // 找到返回結果 因為返回值是 section和 section中哪些區塊可能有值的bitmap // 所以需要遍歷這個bitmap,找到那些被置位的區塊,把區塊號返回回去。 if !ok { return } // Calculate the first and last blocks of the section sectionStart := res.section * m.sectionSize first := sectionStart if begin > first { first = begin } last := sectionStart + m.sectionSize - 1 if end < last { last = end } // Iterate over all the blocks in the section and return the matching ones for i := first; i <= last; i++ { // Skip the entire byte if no matches are found inside next := res.bitset[(i-sectionStart)/8] if next == 0 { i += 7 continue } // Some bit it set, do the actual submatching if bit := 7 - i%8; next&(1<<bit) != 0 { select { case <-session.quit: return case results <- i: } } } } } }() return session, nil }
run方法
// run creates a daisy-chain of sub-matchers, one for the address set and one // for each topic set, each sub-matcher receiving a section only if the previous // ones have all found a potential match in one of the blocks of the section, // then binary AND-ing its own matches and forwaring the result to the next one. // 建立一個子匹配器的流水線,一個用於地址集,一個用於每個主題集,每個子匹配器只有在先前的所有子塊都在該部分的一個塊中找到可能的匹配時才接收一個部分,然後把接收到的和自己的匹配,並將結果轉交給下一個。 // The method starts feeding the section indexes into the first sub-matcher on a // new goroutine and returns a sink channel receiving the results.
該方法開始section indexer送到第一個子匹配器,並返回接收結果的接收器通道。 func (m *Matcher) run(begin, end uint64, buffer int, session *MatcherSession) chan *partialMatches { // Create the source channel and feed section indexes into source := make(chan *partialMatches, buffer) session.pend.Add(1) go func() { defer session.pend.Done() defer close(source) for i := begin / m.sectionSize; i <= end/m.sectionSize; i++ { // 這個for迴圈 構造了subMatch的第一個輸入源,剩下的subMatch把上一個的結果作為自己的源 // 這個源的bitset欄位都是0xff,代表完全的匹配,它將和我們這一步的匹配進行與操作,得到這一步匹配的結果。 select { case <-session.quit: return case source <- &partialMatches{i, bytes.Repeat([]byte{0xff}, int(m.sectionSize/8))}: } } }() // Assemble the daisy-chained filtering pipeline next := source dist := make(chan *request, buffer) for _, bloom := range m.filters { //構建流水線, 前一個的輸出作為下一個subMatch的輸入。 next = m.subMatch(next, dist, bloom, session) } // Start the request distribution session.pend.Add(1) // 啟動distributor執行緒。 go m.distributor(dist, session) return next }
subMatch函式
// subMatch creates a sub-matcher that filters for a set of addresses or topics, binary OR-s those matches, then // binary AND-s the result to the daisy-chain input (source) and forwards it to the daisy-chain output. // The matches of each address/topic are calculated by fetching the given sections of the three bloom bit indexes belonging to // that address/topic, and binary AND-ing those vectors together. // subMatch建立一個子匹配器,用於過濾一組地址或主題,對這些主題進行bit位或操作,然後將上一個結果與當前過濾結果進行位與操作,如果結果不全位空,就把結果傳遞給下一個子匹配器。 每個地址/題目的匹配是通過獲取屬於該地址/題目的三個布隆過濾器位索引的給定部分以及將這些向量二進位制AND並在一起來計算的。 subMatch是最重要的一個函式, 把filters [][][3]的 第一維度的與,第二維度的或, 第三維度的與操作 結合在一起。
func (m *Matcher) subMatch(source chan *partialMatches, dist chan *request, bloom []bloomIndexes, session *MatcherSession) chan *partialMatches { // Start the concurrent schedulers for each bit required by the bloom filter // 傳入的bloom []bloomIndexes引數是filters的第二,第三維度 [][3]
sectionSources := make([][3]chan uint64, len(bloom)) sectionSinks := make([][3]chan []byte, len(bloom)) for i, bits := range bloom { // i代表了第二維度的數量 for j, bit := range bits { //j 代表了布隆過濾器的下標 肯定只有三個 取值(0-2048) sectionSources[i][j] = make(chan uint64, cap(source)) // 建立scheduler的輸入channel sectionSinks[i][j] = make(chan []byte, cap(source)) // 建立 scheduler的輸出channel // 對這個bit發起排程請求, 通過sectionSources[i][j]傳遞需要查詢的section // 通過sectionSinks[i][j]來接收結果 // dist 是scheduler傳遞請求的通道。 這個在scheduler的介紹裡面有。 m.schedulers[bit].run(sectionSources[i][j], dist, sectionSinks[i][j], session.quit, &session.pend) } } process := make(chan *partialMatches, cap(source)) // entries from source are forwarded here after fetches have been initiated 中間channel results := make(chan *partialMatches, cap(source)) // 返回值channel session.pend.Add(2) go func() { // Tear down the goroutine and terminate all source channels defer session.pend.Done() defer close(process) defer func() { for _, bloomSources := range sectionSources { for _, bitSource := range bloomSources { close(bitSource) } } }() // Read sections from the source channel and multiplex into all bit-schedulers // 從source channel讀取sections 並把這些資料通過sectionSources傳遞給scheduler for { select { case <-session.quit: return case subres, ok := <-source: // New subresult from previous link if !ok { return } // Multiplex the section index to all bit-schedulers for _, bloomSources := range sectionSources { for _, bitSource := range bloomSources { // 傳遞給上面的所有的scheduler的輸入通道。 申請對這些 // section 的指定bit進行查詢。 結果會發送給sectionSinks[i][j] select { case <-session.quit: return case bitSource <- subres.section: } } } // Notify the processor that this section will become available select { case <-session.quit: return case process <- subres: //等到所有的請求都遞交給scheduler 給process傳送訊息。 } } } }() go func() { // Tear down the goroutine and terminate the final sink channel defer session.pend.Done() defer close(results) // Read the source notifications and collect the delivered results for { select { case <-session.quit: return case subres, ok := <-process: // 這裡有個問題。 有沒有可能亂序。 因為通道都是有快取的。 可能查詢得快慢導致 // 查看了scheduler的實現, scheduler是保證順序的。怎麼進來,就會怎麼出去。 // Notified of a section being retrieved if !ok { return } // Gather all the sub-results and merge them together var orVector []byte for _, bloomSinks := range sectionSinks { var andVector []byte for _, bitSink := range bloomSinks { // 這裡可以接收到三個值 每個代表了對應下標的 布隆過濾器的值,對這三個值進行與操作, 就可以得到那些區塊可能存在對應的值。 var data []byte select { case <-session.quit: return case data = <-bitSink: } if andVector == nil { andVector = make([]byte, int(m.sectionSize/8)) copy(andVector, data) } else { bitutil.ANDBytes(andVector, andVector, data) } } if orVector == nil { 對第一維度的資料執行 Or操作。 orVector = andVector } else { bitutil.ORBytes(orVector, orVector, andVector) } } if orVector == nil { //可能通道被關閉了。 沒有查詢到任何值 orVector = make([]byte, int(m.sectionSize/8)) } if subres.bitset != nil { // 和輸入的上一次的結果進行與操作。 記得最開始這個值被初始化為全1 bitutil.ANDBytes(orVector, orVector, subres.bitset) } if bitutil.TestBytes(orVector) { // 如果不全為0 那麼新增到結果。可能會給下一個匹配。或者是返回。 select { case <-session.quit: return case results <- &partialMatches{subres.section, orVector}: } } } } }() return results }
distributor,接受來自scheduler的請求,並把他們放到一個set裡面。 然後把這些任務指派給retrievers來填充他們。 // distributor receives requests from the schedulers and queues them into a set // of pending requests, which are assigned to retrievers wanting to fulfil them. func (m *Matcher) distributor(dist chan *request, session *MatcherSession) { defer session.pend.Done() var ( requests = make(map[uint][]uint64) // Per-bit list of section requests, ordered by section number unallocs = make(map[uint]struct{}) // Bits with pending requests but not allocated to any retriever retrievers chan chan uint // Waiting retrievers (toggled to nil if unallocs is empty) ) var ( allocs int // Number of active allocations to handle graceful shutdown requests shutdown = session.quit // Shutdown request channel, will gracefully wait for pending requests ) // assign is a helper method fo try to assign a pending bit an an actively // listening servicer, or schedule it up for later when one arrives. assign := func(bit uint) { select { case fetcher := <-m.retrievers: allocs++ fetcher <- bit default: // No retrievers active, start listening for new ones retrievers = m.retrievers unallocs[bit] = struct{}{} } } for { select { case <-shutdown: // Graceful shutdown requested, wait until all pending requests are honoured if allocs == 0 { return } shutdown = nil case <-session.kill: // Pending requests not honoured in time, hard terminate return case req := <-dist: // scheduler傳送來的請求 新增到指定bit位置的queue裡面 // New retrieval request arrived to be distributed to some fetcher process queue := requests[req.bit] index := sort.Search(len(queue), func(i int) bool { return queue[i] >= req.section }) requests[req.bit] = append(queue[:index], append([]uint64{req.section}, queue[index:]...)...) // If it's a new bit and we have waiting fetchers, allocate to them // 如果這個bit是一個新的。 還沒有被指派,那麼我們把他指派給等待的fetchers if len(queue) == 0 { assign(req.bit) } case fetcher := <-retrievers: // New retriever arrived, find the lowest section-ed bit to assign // 如果新的retrievers進來了, 那麼我們檢視是否有任務沒有指派 bit, best := uint(0), uint64(math.MaxUint64) for idx := range unallocs { if requests[idx][0] < best { bit, best = idx, requests[idx][0] } } // Stop tracking this bit (and alloc notifications if no more work is available) delete(unallocs, bit) if len(unallocs) == 0 { //如果所有任務都被指派。那麼停止關注retrievers retrievers = nil } allocs++ fetcher <- bit case fetcher := <-m.counters: // New task count request arrives, return number of items // 來了新的請求,訪問request的指定bit的數量。 fetcher <- uint(len(requests[<-fetcher])) case fetcher := <-m.retrievals: // New fetcher waiting for tasks to retrieve, assign // 有人來領取任務。 task := <-fetcher if want := len(task.Sections); want >= len(requests[task.Bit]) { task.Sections = requests[task.Bit] delete(requests, task.Bit) } else { task.Sections = append(task.Sections[:0], requests[task.Bit][:want]...) requests[task.Bit] = append(requests[task.Bit][:0], requests[task.Bit][want:]...) } fetcher <- task // If anything was left unallocated, try to assign to someone else // 如果還有任務沒有分派完。 嘗試分配給其他人。 if len(requests[task.Bit]) > 0 { assign(task.Bit) } case result := <-m.deliveries: // New retrieval task response from fetcher, split out missing sections and // deliver complete ones // 收到了任務的結果。 var ( sections = make([]uint64, 0, len(result.Sections)) bitsets = make([][]byte, 0, len(result.Bitsets)) missing = make([]uint64, 0, len(result.Sections)) ) for i, bitset := range result.Bitsets { if len(bitset) == 0 { //如果任務結果有缺失,記錄下來 missing = append(missing, result.Sections[i]) continue } sections = append(sections, result.Sections[i]) bitsets = append(bitsets, bitset) } // 投遞結果 m.schedulers[result.Bit].deliver(sections, bitsets) allocs-- // Reschedule missing sections and allocate bit if newly available if len(missing) > 0 { //如果有缺失, 那麼重新生成新的任務。 queue := requests[result.Bit] for _, section := range missing { index := sort.Search(len(queue), func(i int) bool { return queue[i] >= section }) queue = append(queue[:index], append([]uint64{section}, queue[index:]...)...) } requests[result.Bit] = queue if len(queue) == len(missing) { assign(result.Bit) } } // If we're in the process of shutting down, terminate if allocs == 0 && shutdown == nil { return } } } }
任務領取AllocateRetrieval。 任務領取了一個任務。 會返回指定的bit的檢索任務。
// AllocateRetrieval assigns a bloom bit index to a client process that can either // immediately reuest and fetch the section contents assigned to this bit or wait // a little while for more sections to be requested. func (s *MatcherSession) AllocateRetrieval() (uint, bool) { fetcher := make(chan uint) select { case <-s.quit: return 0, false case s.matcher.retrievers <- fetcher: bit, ok := <-fetcher return bit, ok } }
AllocateSections,領取指定bit的section查詢任務。
// AllocateSections assigns all or part of an already allocated bit-task queue // to the requesting process. func (s *MatcherSession) AllocateSections(bit uint, count int) []uint64 { fetcher := make(chan *Retrieval) select { case <-s.quit: return nil case s.matcher.retrievals <- fetcher: task := &Retrieval{ Bit: bit, Sections: make([]uint64, count), } fetcher <- task return (<-fetcher).Sections } }
DeliverSections,把結果投遞給deliveries 通道。
// DeliverSections delivers a batch of section bit-vectors for a specific bloom // bit index to be injected into the processing pipeline. func (s *MatcherSession) DeliverSections(bit uint, sections []uint64, bitsets [][]byte) { select { case <-s.kill: return case s.matcher.deliveries <- &Retrieval{Bit: bit, Sections: sections, Bitsets: bitsets}: } }
任務的執行Multiplex,Multiplex函式不斷的領取任務,把任務投遞給bloomRequest佇列。從佇列獲取結果。然後投遞給distributor。 完成了整個過程。
// Multiplex polls the matcher session for rerieval tasks and multiplexes it into // the reuested retrieval queue to be serviced together with other sessions. // // This method will block for the lifetime of the session. Even after termination // of the session, any request in-flight need to be responded to! Empty responses // are fine though in that case. func (s *MatcherSession) Multiplex(batch int, wait time.Duration, mux chan chan *Retrieval) { for { // Allocate a new bloom bit index to retrieve data for, stopping when done bit, ok := s.AllocateRetrieval() if !ok { return } // Bit allocated, throttle a bit if we're below our batch limit if s.PendingSections(bit) < batch { select { case <-s.quit: // Session terminating, we can't meaningfully service, abort s.AllocateSections(bit, 0) s.DeliverSections(bit, []uint64{}, [][]byte{}) return case <-time.After(wait): // Throttling up, fetch whatever's available } } // Allocate as much as we can handle and request servicing sections := s.AllocateSections(bit, batch) request := make(chan *Retrieval) select { case <-s.quit: // Session terminating, we can't meaningfully service, abort s.DeliverSections(bit, sections, make([][]byte, len(sections))) return case mux <- request: // Retrieval accepted, something must arrive before we're aborting request <- &Retrieval{Bit: bit, Sections: sections} result := <-request s.DeliverSections(result.Bit, result.Sections, result.Bitsets) } } }
網址:http://www.qukuailianxueyuan.io/
欲領取造幣技術與全套虛擬機器資料
區塊鏈技術交流QQ群:756146052 備註:CSDN
尹成學院微信:備註:CSDN