Lucene筆記12-Lucene的搜尋-複習和再查詢分頁搜尋

阿新 • • 發佈：2018-11-04

一、Lucene的分頁搜尋

Lucene的分頁不像資料庫中的limit的方式，而是提供了一種“再查詢”的方式。什麼是“再查詢”呢？就是第一次把所有的資料都取出來，第二次查詢再根據需求，從第幾條取到第幾條，分兩步進行查詢，所以叫“再查詢”。

二、測試程式碼

package com.wsy;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class FileIndexUtils {
    private static Directory directory;
    private static IndexReader indexReader;

    static {
        try {
            directory = FSDirectory.open(new File("E:\\Lucene\\IndexLibrary"));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public FileIndexUtils() {
        try {
            indexReader = IndexReader.open(directory);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void index(boolean update) {
        IndexWriter indexWriter = null;
        try {
            indexWriter = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35)));
            if (update) {
                indexWriter.deleteAll();
            }
            File[] files = new File("E:\\Lucene\\SearchSource").listFiles();
            for (File file : files) {
                Document document = new Document();
                document.add(new Field("content", new FileReader(file)));
                document.add(new Field("fileName", file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                document.add(new Field("path", file.getAbsolutePath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                document.add(new NumericField("date", Field.Store.YES, true).setLongValue(file.lastModified()));
                document.add(new NumericField("size", Field.Store.YES, true).setIntValue((int) (file.length() / 1024)));
                indexWriter.addDocument(document);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (indexWriter != null) {
                try {
                    indexWriter.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public void searchPage(String queryString, int pageIndex, int pageSize) {
        try {
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);
            QueryParser queryParser = new QueryParser(Version.LUCENE_35, "content", new StandardAnalyzer(Version.LUCENE_35));
            Query query = queryParser.parse(queryString);
            TopDocs topDocs = indexSearcher.search(query, pageIndex * pageSize);
            ScoreDoc[] scoreDocs = topDocs.scoreDocs;
            // 分頁查詢
            for (int i = (pageIndex - 1) * pageSize; i < pageIndex * pageSize; i++) {
                Document document = indexSearcher.doc(scoreDocs[i].doc);
                System.out.println(scoreDocs[i].doc + ":" + document.get("path") + " " + document.get("fileName"));
            }
            System.out.println("-------------------------------------------------");
            // 不分頁查詢
            for (int i = 0; i < scoreDocs.length; i++) {
                Document document = indexSearcher.doc(scoreDocs[i].doc);
                System.out.println(scoreDocs[i].doc + ":" + document.get("path") + " " + document.get("fileName"));
            }
            indexSearcher.close();
        } catch (ParseException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        FileIndexUtils.index(true);
        FileIndexUtils fileIndexUtils = new FileIndexUtils();
        fileIndexUtils.searchPage("java", 2, 3);
    }
}

在searchPage()裡面，我們對分頁和不分頁都做了查詢，用於對比，可以知道分頁結果正確。

Lucene筆記12-Lucene的搜尋-複習和再查詢分頁搜尋

一、Lucene的分頁搜尋 Lucene的分頁不像資料庫中的limit的方式，而是提供了一種“再查詢”的方式。什麼是“再查詢”呢？就是第一次把所有的資料都取出來，第二次查詢再根據需求，從第幾條取到第幾條，分兩步進行查詢，所以叫“再查詢”。二、測試程式碼 package com.ws

關於SQLServer和MySQL 查詢分頁語句區別

首先來定義幾個要用到的引數(例子) t_user資料表 int currentPage ; //當前頁 int pageRecord ; //每頁顯示記錄數關於SqlServer資料庫分頁SQL語句為: String sql = "sel

Lucene筆記23-Lucene的使用-簡單複習索引、檢索和分詞

一、索引索引過程中的核心類：IndexWriter、Directory、Analyzer、Document、Field。 IndexWriter用來寫索引。 Directory用來定義索引的目標位置是硬碟上還是記憶體中。 Analyzer用來分詞，常用的分詞器有：SimpleA

Lucene筆記37-Lucene如何通過NRTManager和SearchManager實現近實時搜尋

一、思路分析如何實現近實時搜尋呢？每次更新完索引都commit？那恐怕太浪費資源了，當資料量非常龐大的時候，幾乎不可能。這裡有兩種方案。使用SearchManager來管理IndexSearcher物件，當發現索引更新之後，searchManager會呼叫maybeReopen

Lucene筆記13-Lucene的搜尋-基於searchAfter的實現

一、使用searchAfter完成分頁查詢 searchAfter()方法原理是獲取上一頁的最後一個元素和pageSize，再從最後一個元素的後一個開始取pageSize條資料，這就是需要展示的結果了。searchAfter()方法需要三個引數：ScoreDoc after, Query q

Lucene筆記11-Lucene的搜尋-基於QueryParser的搜尋

一、QueryParser查詢 QueryParser查詢功能是非常強大的，幾乎可以涵蓋前面所有的查詢，下面是方法體，它將query作為引數傳遞進來。 // 表示式查詢 public void searchByQueryParser(Query query, int number) {

Lucene筆記10-Lucene的搜尋-其他常用Query搜尋

一、字首搜尋 // 字首查詢 public void searchByPrefix(String field, String name, int number) { IndexSearcher indexSearcher = getIndexSearcher(); try {

Lucene筆記09-Lucene的搜尋-TermRange等基本搜尋

一、精確查詢 // 精確查詢，對field域查詢name public void searchByTerm(String field, String name, int number) { IndexSearcher indexSearcher = getIndexSearcher(

Lucene筆記33-Lucene的擴充套件-使用Tika建立索引並進行搜尋

一、使用Tika建立索引之前建立索引的文件都是txt檔案，現在有了Tika，我們就可以將pdf，word，html等檔案，通過Tika提取出文字，之後建立索引，建立索引的寫法和之前大致相似。只需要將content域對應的值做一下處理，之前是FileReader來讀取，現在是使用Tika.p

Lucene筆記27-Lucene的使用-自定義QueryParser解決日期和數字範圍問題

一、需求說明 Lucene提供的getRangeQuery(String field, String part1, String part2, boolean inclusive)方法支援的是String，假設我們需要對數字範圍查詢，那麼就需要我們自己來改寫了。二、程式碼實現重寫

數學筆記12——常微分方程和分離變量

ref 積分 sub 名稱答案曲線技術斜率理學常微分方程　　含有未知函數的導數，如　　的方程是微分方程。一般的，凡是表示未知函數、未知函數的導數與自變量之間的關系的方程，叫做微分方程。未知函數是一元函數的，叫常微分方程；未知函數是多元函數的叫做偏微分方

Python+Selenium學習筆記12 - 窗口大小和滾動條

www ref 分享圖片滾動條 IT 邊距 utf-8 alt set 涉及到的三個方法 set_window_size() 用於設置瀏覽器窗口的大小 e.gset_window_size(600,600) window.scrollTo() 用於設置瀏覽器窗口滾動條的

Lucene筆記26-Lucene的使用-自定義QueryParser解決部分查詢的效能問題

一、使用自定義QueryParser的原因對於某些QueryParser（FuzzyQuery、WildcardQuery）在查詢時會使得效能降低，所以考慮將這些查詢取消。在具體的查詢時候，很可能有這樣一種需求：獲取的是一個數字查詢範圍，所以必須要擴充套件原有的QueryPa

Lucene筆記25-Lucene的使用-根據域進行評分設定

一、需求根據檔名來設定評分規則，或者根據文件的修改時間，將最近一年的評分加倍，一年以外的評分降低，等等。二、具體實現這裡根據檔名來修改評分規則，檔名中包含“JRE”和“SYSTEM”的評分加倍，其餘減倍。重點就是怎麼獲取到檔名，在customScore()方法中，有一個doc變

Lucene筆記24-Lucene的使用-自定義評分簡介

一、自定義評分流程有時候，Lucene提供的計算評分規則可能不符合業務需求，所以我們需要自定義評分規則，來實現自定義評分。自定義評分的流程：建立一個類繼承CustomScoreQuery、重寫getCustomScoreProvider()方法、建立CustomScoreProvider類

Lucene筆記22-Lucene的使用-Filter

一、Filter應用場景假如有人搜尋了一個關鍵詞，通過Lucene查出來了所有的文件，讀者比較關心最新的一些內容，因此需要將某些內容過濾掉。只顯示使用者敏感的文件資料即可。這就要用到過濾器。二、程式碼演示 public void filter(String queryString

Lucene筆記21-Lucene的自定義排序

一、排序介紹 Lucene對文件搜尋完成後，顯示的結果是有一個順序的，如果沒有設定排序規則，那麼這個順序就是按照文件的評分降序排列，至於評分的計算，是一個比較複雜的公式，這裡不先研究了。可是有時候，我們需要根據需求，改變預設的排序規則，這時候就要用到自定義排序啦，下面來看一下自定義排序是怎麼

Lucene筆記20-Lucene的分詞-實現自定義同義詞分詞器-實現分詞器（良好設計方案）

一、目前存在的問題在getSameWords()方法中，我們使用map臨時存放了兩個鍵值對用來測試，實際開發中，往往需要很多的這種鍵值對來處理，比如從某個同義詞詞典裡面獲取值之類的，所以說，我們需要一個類，根據key提供近義詞。為了能更好的適應應用場景，我們先定義一個介面，其中定義一

Lucene筆記19-Lucene的分詞-實現自定義同義詞分詞器-實現分詞器

一、同義詞分詞器的程式碼實現 package com.wsy; import com.chenlb.mmseg4j.Dictionary; import com.chenlb.mmseg4j.MaxWordSeg; import com.chenlb.mmseg4j.analysis.MM

Lucene筆記18-Lucene的分詞-實現自定義同義詞分詞器-思路分析

一、實現自定義同義詞分詞器思路分析前面文章我們提到同義詞分詞器，這裡我們先來分析下同義詞分詞器的設計思路。首先我們有一個需要分詞的字串string，通過new StringReader(string)拿到Reader。使用analyzer.tokenStream("co

Lucene筆記12-Lucene的搜尋-複習和再查詢分頁搜尋

一、Lucene的分頁搜尋

二、測試程式碼

相關推薦