1. 程式人生 > >Lucene中的近實時搜尋SearcherManager

Lucene中的近實時搜尋SearcherManager

近實時搜尋(near-real-time)可以搜尋IndexWriter還未commit的內容。

Index索引的重新整理過程:

只有IndexWriter上的commit操作才會導致Ram Directory記憶體上的資料完全同步到檔案。
IndexWriter提供了實時獲得reader的API,這個呼叫將會導致flush操作,生成新的segment,但不會commit (fsync),從而減少了IO。新的segment被加入到新生成的reader裡。從返回的reader中可以看到更新。
所以,只要每次新的搜尋都從IndexWriter獲得一個新的reader,就可以搜尋到最新的內容。這一操作的開銷僅僅是flush,相對commmit來說,開銷很小。

Lucene的index索引組織方式為一個index目錄下的多個segment片段,新的doc會加入新的segment裡,這些新的小segment每間隔一段時間就會合並起來。因為合併,總的sgement數量保持的較小,總體的search速度仍然很快。
為了防止讀寫衝突,lucene只建立新的segment,並對任何active狀態的reader,不在使用後刪除老的segment。
flush就是把資料寫入作業系統的緩衝區,只要緩衝區不滿,就不會有硬碟操作。
commit是把所有記憶體緩衝區內的資料寫入到硬碟,是完全的硬碟操作,屬於重量級的操作。這是因為Lucene索引中最主要的結構posting倒排通過VInt型別和delta的格式儲存並緊密排列。合併時要對同一個term的posting(倒排)進行歸併排序,是一個讀出,合併再生成的過程。

SearchManager近實時搜尋 實現原理:

Lucene通過NRTManager這個類來實現近實時搜尋,所謂近實時搜尋也就是在索引發生改變時,通過執行緒跟蹤,在相對很短的時間內反映給使用者程式的 呼叫NRTManager通過管理IndexWriter物件,並將IndexWriter的一些方法進行增刪改,例如:addDocument,deleteDocument等方法暴漏給客戶呼叫,它的操作全部在記憶體裡面,所以如果你不呼叫IndexWriter的commit方法,通過以上的操作,使用者硬盤裡面的索引庫是不會變化的,所以你每次更新完索引庫請記得commit掉,這樣才能將變化的索引一起寫到硬碟中。

實現索引更新後的同步使用者每次獲取最新索引(IndexSearcher),可以通過兩種方式:

第一種是通過呼叫NRTManagerReopenThread物件,該執行緒負責實時跟蹤索引記憶體的變化,每次變化就呼叫maybeReopen方法,保持最新代索引,開啟一個新的IndexSearcher物件,而使用者所要的IndexSearcher物件是NRTManager通過呼叫getSearcherManager方法獲得SearcherManager物件,然後通過SearcherManager物件獲取IndexSearcher物件返回個客戶使用,使用者使用完之後呼叫SearcherManager的release釋放IndexSearcher物件,最後記得關閉NRTManagerReopenThread;
第二種方式是不通過NRTManagerReopenThread物件,而是直接呼叫NRTManager的maybeReopen方法來獲取最新的IndexSearcher物件來獲取最新索引.

    public void testSearch() throws IOException {

        Directory directory = FSDirectory.open(new File("/root/data/03"));
        SearcherManager sm = new SearcherManager(directory, null);
        IndexSearcher searcher = sm.acquire();
        // IndexReader reader = DirectoryReader.open(directory);
        // IndexSearcher searcher = new IndexSearcher(reader);
        Query query = new TermQuery(new Term("title", "test"));
        TopDocs results = searcher.search(query, null, 100);
        System.out.println(results.totalHits);
        ScoreDoc[] docs = results.scoreDocs;
        for (ScoreDoc doc : docs) {
            System.out.println("doc inertalid:" + doc.doc + " ,docscore:" + doc.score);
            Document document = searcher.doc(doc.doc);
            System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
        }
        sm.release(searcher);
        sm.close();
    }
    public void testUpdateAndSearch() throws IOException, InterruptedException {

        Directory directory = FSDirectory.open(new File("/root/data/03"));

        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        config.setOpenMode(OpenMode.CREATE_OR_APPEND);
        IndexWriter writer = new IndexWriter(directory, config);
        TrackingIndexWriter trackingWriter = new TrackingIndexWriter(writer);
        SearcherManager sm = new SearcherManager(writer, true, null);
        ControlledRealTimeReopenThread thread = new ControlledRealTimeReopenThread(trackingWriter, sm, 60, 1);
        thread.setDaemon(true);
        thread.setName("NRT Index Manager Thread");
        thread.start();

        Document doc = new Document();
        Field idField = new StringField("id", "3", Store.YES);
        Field titleField = new TextField("title", "test for 3", Store.YES);
        doc.add(idField);
        doc.add(titleField);
        long gerenation = trackingWriter.updateDocument(new Term("id", "2"), doc);
        // Thread.sleep(1000);
        // writer.close();
        // sm.maybeRefresh();
        // sm = new SearcherManager(writer, true, null);
        thread.waitForGeneration(gerenation);
        IndexSearcher searcher = sm.acquire();
        Query query = new TermQuery(new Term("title", "test"));
        TopDocs results = searcher.search(query, null, 100);
        System.out.println(results.totalHits);
        ScoreDoc[] docs = results.scoreDocs;
        for (ScoreDoc scoreDoc : docs) {
            System.out.println("doc inertalid:" + scoreDoc.doc + " ,docscore:" + scoreDoc.score);
            Document document = searcher.doc(scoreDoc.doc);
            System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
        }
        sm.release(searcher);
        sm.close();

        // IndexSearcher searcher = sm.acquire();

        // IndexReader reader = DirectoryReader.open(directory);
        // IndexSearcher searcher = new IndexSearcher(reader);
        // Query query = new TermQuery(new Term("title", "test"));
        // TopDocs results = searcher.search(query, null, 100);
        // System.out.println(results.totalHits);
        // ScoreDoc[] docs = results.scoreDocs;
        // for (ScoreDoc doc : docs) {
        // System.out.println("doc inertalid:" + doc.doc + " ,docscore:" +
        // doc.score);
        // Document document = searcher.doc(doc.doc);
        // System.out.println("id:" + document.get("id") + " ,title:" +
        // document.get("title"));
        // }
        // sm.release(searcher);
    }

建立索引:

    public void testBulidIndex() throws IOException {
        Directory directory = FSDirectory.open(new File("/root/data/03"));
        // Directory directory=new RAMDirectory();
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        config.setOpenMode(OpenMode.CREATE);
        IndexWriter writer = new IndexWriter(directory, config);
        Document doc1 = new Document();
        Field idField1 = new StringField("id", "1", Store.YES);
        Field titleField1 = new TextField("title", "test for 1", Store.YES);
        doc1.add(idField1);
        doc1.add(titleField1);
        writer.addDocument(doc1);

        Document doc2 = new Document();
        Field idField2 = new StringField("id", "2", Store.YES);
        Field titleField2 = new TextField("title", "test for 2", Store.YES);
        doc2.add(idField2);
        doc2.add(titleField2);
        writer.addDocument(doc2);

        writer.commit();
        writer.close();
    }