Lucene7.2.1系列（二）luke使用及索引文件的基本操作

阿新 • • 發佈：2019-01-03

系列文章：

luke入門

簡介：

下載地址：https://github.com/DmitryKey/luke/releases
luke圖示
Luke是一個用於Lucene/Solr/Elasticsearch 搜尋引擎的，方便開發和診斷的 GUI（視覺化）工具。

它有以下功能：

檢視文件並分析其內容（用於儲存欄位）
在索引中搜索
執行索引維護：索引執行狀況檢查；索引優化（執行前需要備份）
從hdfs讀取索引
將索引或其部分匯出為XML格式
測試定製的Lucene分析工具

- 建立自己的外掛

luke適用的搜尋引擎

Apache Lucene. 大多數情況下，luke可以開啟由純Lucene生成的lucene索引。現在人們做出純粹的Lucene索引嗎？

Apache Solr. Solr和Lucene共享相同的程式碼庫，所以luke很自然可以開啟Solr生成的Lucene索引。
Elasticsearch. Elasticsearch使用Lucene作為其最低級別的搜尋引擎基礎。所以luke也可以開啟它的索引！

下載安裝與簡單使用

下載安裝

1.

2.

3.

4.

5.

索引文件的CRUD操作

建立專案並新增Maven依賴

        <dependency>
            <groupId>junit</groupId>
            <artifactId 
>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
        <!-- Lucene核心庫 -->
        <dependency>
            <groupId 
>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>7.2.1</version>
        </dependency>
        <!-- Lucene解析庫 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>7.2.1</version>
        </dependency>
        <!-- Lucene附加的分析庫 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>7.2.1</version>
        </dependency>

我們下面要用到單元測試，所以這裡我們添加了Junit單元測試的依賴（版本為4.12，2018/3/30日最新的版本）

相關測試程式碼

主方法：

package lucene_index_crud;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.Test;

public class Txt1 {
    // 下面是測試用到的資料
    private String ids[] = { "1", "2", "3" };
    private String citys[] = { "qingdao", "nanjing", "shanghai" };
    private String descs[] = { "Qingdao is a beautiful city.", "Nanjing is a city of culture.",
            "Shanghai is a bustling city." };
    //Directory物件  
    private Directory dir;
}

相關測試方法編寫：

1)測試建立索引

    /**
     * 建立索引
     * @throws Exception
     */
    @Test
    public void testWriteIndex() throws Exception {
        //寫入索引文件的路徑
        dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
        IndexWriter writer = getWriter();
        for (int i = 0; i < ids.length; i++) {
            //建立文件物件，文件是索引和搜尋的單位。
            Document doc = new Document();
            doc.add(new StringField("id", ids[i], Field.Store.YES));
            doc.add(new StringField("city", citys[i], Field.Store.YES));
            doc.add(new TextField("desc", descs[i], Field.Store.NO));
            // 新增文件
            writer.addDocument(doc); 
        }
        writer.close();
    }

通過luke檢視相關資訊：
desc
city

注意： 建立索引之後，後續測試方法才能正確執行。

2)測試寫入了幾個文件：

    /**
     * 測試寫了幾個文件
     * 
     * @throws Exception
     */
    @Test
    public void testIndexWriter() throws Exception {
        //寫入索引文件的路徑
        dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
        IndexWriter writer = getWriter();
        System.out.println("寫入了" + writer.numDocs() + "個文件");
        writer.close();
    }

testIndexWriter()
3)測試讀取了幾個文件：

    /**
     * 測試讀取了幾個文件
     * 
     * @throws Exception
     */
    @Test
    public void testIndexReader() throws Exception {
        //寫入索引文件的路徑
        dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
        IndexReader reader = DirectoryReader.open(dir);
        System.out.println("最大文件數：" + reader.maxDoc());
        System.out.println("實際文件數：" + reader.numDocs());
        reader.close();
    }

testIndexReader()
4)測試刪除在合併前：

    /**
     * 測試刪除 在合併前
     * 
     * @throws Exception
     */
    @Test
    public void testDeleteBeforeMerge() throws Exception {
        //寫入索引文件的路徑
        dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
        IndexWriter writer = getWriter();
        System.out.println("刪除前：" + writer.numDocs());
        writer.deleteDocuments(new Term("id", "1"));
        writer.commit();
        System.out.println("writer.maxDoc()：" + writer.maxDoc());
        System.out.println("writer.numDocs()：" + writer.numDocs());
        writer.close();
    }

testDeleteBeforeMerge()
5)測試刪除在合併後：

我們這裡先把dataindex目錄下的檔案刪除，然後執行上面的testWriteIndex() 方法之後再測試。

    /**
     * 測試刪除 在合併後
     * 
     * @throws Exception
     */
    @Test
    public void testDeleteAfterMerge() throws Exception {
           //寫入索引文件的路徑
        dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
        IndexWriter writer = getWriter();
        System.out.println("刪除前：" + writer.numDocs());
        writer.deleteDocuments(new Term("id", "1"));
        writer.forceMergeDeletes(); // 強制刪除
        writer.commit();
        System.out.println("writer.maxDoc()：" + writer.maxDoc());
        System.out.println("writer.numDocs()：" + writer.numDocs());
        writer.close();
    }

testDeleteAfterMerge()
6)測試更新操作：

我們這裡先把dataindex目錄下的檔案刪除，然後執行上面的testWriteIndex() 方法之後再測試。

    /**
     * 測試更新
     * 
     * @throws Exception
     */
    @Test
    public void testUpdate() throws Exception {
        // 寫入索引文件的路徑
        dir = FSDirectory.open(Paths.get("D:\\lucene\\index_crud\\indexdata"));
        IndexWriter writer = getWriter();
        Document doc = new Document();
        doc.add(new StringField("id", "1", Field.Store.YES));
        doc.add(new StringField("city", "beijing", Field.Store.YES));
        doc.add(new TextField("desc", "beijing is a city.", Field.Store.NO));
        writer.updateDocument(new Term("id", "1"), doc);
        writer.close();
    }

desc
city

歡迎關注我的微信公眾號（分享各種Java學習資源，面試題，以及企業級Java實戰專案回覆關鍵字免費領取）：

Lucene我想暫時先更新到這裡，僅僅這三篇文章想掌握Lucene是遠遠不夠的。另外我這裡三篇文章都用的最新的jar包，Lucene更新太快，5系列後的版本和之前的有些地方還是有挺大差距的，就比如為文件域設定權值的setBoost方法6.6以後已經被廢除了等等。因為時間有限，所以我就草草的看了一下Lucene的官方文件，大多數內容還是看java1234網站的這個視訊來學習的，然後在版本和部分程式碼上做了改進。截止2018/4/1，上述程式碼所用的jar包皆為最新。

最後推薦一下自己覺得還不錯的Lucene學習網站/部落格：

官方網站：[Welcome to Apache Lucene](Welcome to Apache Lucene)

Lucene7.2.1系列（二）luke使用及索引文件的基本操作

luke入門

簡介：

- 建立自己的外掛

luke適用的搜尋引擎

下載安裝與簡單使用

索引文件的CRUD操作

Lucene7.2.1系列（二）luke使用及索引文件的基本操作

Lucene7.2.1系列（三）查詢及高亮

Mybatis學習系列（二）Mapper映射文件

asp.net core 2.1 dotnet （二）

SpringBoot 2.0 系列（二）：流程詳解（上）

Spring 源碼解析（二）加載配置文件2

mongodb系列（二）使用複合索引中要注意欄位的前後

XML（二）之DTD——XML文件約束

Redis詳解（二）------ redis的配置文件介紹

Nginx（二）------nginx.conf 配置文件

Java小遊戲DanceWithStars（二）：修改本地文件中的圖片（圖片寬高和圖片型別）以及將圖片設定為JButton的影象

JAVA核心技術I---JAVA基礎知識（文件系統及java文件基本操作）

深度學習基礎系列（二）| 常見的Top-1和Top-5有什麽區別？

jdk 1.7系列（二）文件 I/O 的基石：Path

jquery 1.7.2原始碼解析（二）構造jquery物件

從0到1使用Kubernetes系列（二）——安裝工具介紹

MySQL優化系列（二）--查詢優化（1）（非索引設計）

SparkSQL（Spark-1.4.0)實戰系列（二）——DataFrames進階

MySQL優化系列（二）--查詢優化（2）（外連線、多表聯合查詢以及查詢注意點）

xmpp開發IM即時通訊系列（二）--即時通訊伺服器搭建（1）資料庫搭建

Lucene7.2.1系列（二）luke使用及索引文件的基本操作

luke入門

簡介：

- 建立自己的外掛

luke適用的搜尋引擎

下載安裝與簡單使用

索引文件的CRUD操作

相關推薦