30分鐘，快速上手Lucene v7.4.0

阿新 • • 發佈：2019-02-06

Lucene截止目前，最新版本為v7.4.0。它是一個開放原始碼的全文檢索引擎工具包，但它不是一個完整的全文檢索引擎，而是一個全文檢索引擎的架構，提供了完整的查詢引擎和索引引擎，部分文字分析引擎（英文與德文兩種西方語言）。

1 Lucene環境相關

$ uname -a

Linux ubuntu 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ java -version

java version “1.8.0_181”
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

補充：對應的安裝包名稱為jdk-8u181-linux-x64.tar.gz，Ubuntu 18.04.1 LTS，MyEclipse CI 2018.8.0

2 Add External JARs…(右擊專案->Build Path)

3 配置git

$ sudo apt install git
$ git config --global user.name "qingdujun"
$ git config --global user.email "[email protected]"
$ ssh-keygen -t rsa -C "[email protected] 
"

Generating public/private rsa key pair.
Enter file in which to save the key (/home/qingdujun/.ssh/id_rsa):
Created directory ‘/home/qingdujun/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/qingdujun/.ssh/id_rsa.
Your public key has been saved in /home/qingdujun/.ssh/id_rsa.pub.

$ vim /home/qingdujun/.ssh/id_rsa.pub

拷貝里面的內容至github.com->settings->SSH and GPG keys->New SSH key

4 Lucene v7.4.0 source code

git clone https://github.com/qingdujun/LuceneRes/releases/tag/v0.1

4.1 索引目錄

~/Documents/LuceneRes/Index$ ls

_0.cfe _0.cfs _0.si segments_2 write.lock

4.2 執行結果

Console
Indexing /home/qingdujun/Documents/LuceneRes/Data/LuceneConstants.java
Indexing /home/qingdujun/Documents/LuceneRes/Data/Searcher.java
Indexing /home/qingdujun/Documents/LuceneRes/Data/LuceneTester.java
Indexing /home/qingdujun/Documents/LuceneRes/Data/TextFileFilter.java
Indexing /home/qingdujun/Documents/LuceneRes/Data/Indexer.java
5 File indexed, time taken: 128 ms
3 documents found. Time: 40 ms
File: /home/qingdujun/Documents/LuceneRes/Data/Indexer.java
File: /home/qingdujun/Documents/LuceneRes/Data/Searcher.java
File: /home/qingdujun/Documents/LuceneRes/Data/LuceneTester.java

5 附錄（可執行原始碼）：

5.1 TextFileFilter.java

package com.cfs.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter{
    public static final String TEXT_TYPE = ".java";

    @Override
    public boolean accept(File pathname) {
        return pathname.getName().toLowerCase().endsWith(TEXT_TYPE);
    }
}

5.2 LuceneConstants.java

package com.cfs.lucene;

public class LuceneConstants {
    public static final String CONTENTS = "contents";
    public static final String FILE_NAME = "filename";
    public static final String FILE_PATH = "filepath";
    public static final int MAX_SEARCH = 10;
}

5.3 Indexer.java

package com.cfs.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Indexer {

    private IndexWriter indexWriter = null;
    /**
     * Construct index
     * @param directoryPath The directory where the index is located
     * @throws IOException
     */
    public Indexer(String directoryPath) throws IOException{
        Directory directory = FSDirectory.open(Paths.get(directoryPath));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
        indexWriter = new IndexWriter (directory, indexWriterConfig);
    }
    /**
     * Close write index
     * @throws IOException
     */
    public void close() throws IOException{
        indexWriter.close();
    }
    /**
     * Convert a file to an index object(document)
     * @param file
     * @return
     * @throws IOException
     */
    private Document getDocument(File file) throws IOException{
        Document document = new Document();
        //TextField, A field that is indexed and tokenized
        Field contentsField = new TextField(LuceneConstants.CONTENTS, new FileReader(file));
        document.add(contentsField);
        //StringField, A field that is indexed but not tokenized
        Field fileNameField = new StringField(LuceneConstants.FILE_NAME, file.getName(),Store.YES);
        document.add(fileNameField);
        Field filePathField = new StringField(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Store.YES);
        document.add(filePathField);
        return document;
    }
    /**
     * Index file
     * @param file
     * @throws IOException
     */
    private void indexFile(File file) throws IOException{
        System.out.println("Indexing "+file.getCanonicalPath());
        Document document = getDocument(file);
        indexWriter.addDocument(document);
    }
    /**
     * Index all files
     * @param dataDirPath Indexed data storage directory
     * @param fileFilter
     * @return
     * @throws IOException
     */
    public int createIndex(String dataDirPath, FileFilter fileFilter) throws IOException{
        File[] files = new File(dataDirPath).listFiles();
        for (File file : files) {
            if (!file.isDirectory() && !file.isHidden() && file.exists() 
                    && file.canRead() && fileFilter.accept(file)) {
                indexFile(file);
            }
        }
        return indexWriter.numDocs();
    }

    /**
     * Read the entire contents of the file and return the bytes
     * @param file
     * @return
     * @throws IOException
     */
//  private byte[] read2Bytes(File file) throws IOException {
//      Long fileLength = file.length();
//      byte[] contents = new byte[fileLength.intValue()];
//      FileInputStream fileInputStream = new FileInputStream(file);
//      fileInputStream.read(contents);
//      fileInputStream.close();
//      return contents;
//  } 
//

5.4 Searcher.java

package com.cfs.lucene;

import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Searcher {
    IndexSearcher indexSearcher = null;
    QueryParser queryParser = null;
    Query query = null;
    Directory directory = null;
    DirectoryReader directoryReader = null;

    public Searcher(String directoryPath) throws IOException{
        //Store the index in memory
        //Directory directory = new RAMDirectory();
        //Store an index on disk
        directory = FSDirectory.open(Paths.get(directoryPath));
        directoryReader = DirectoryReader.open(directory);
        indexSearcher = new IndexSearcher(directoryReader);
        queryParser = new QueryParser(LuceneConstants.CONTENTS, new StandardAnalyzer());
    }

    public TopDocs search(String searchQuery) throws ParseException, IOException {
        query = queryParser.parse(searchQuery);
        return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
    }

    public Document getDocument(ScoreDoc scoreDoc) throws IOException {
        return indexSearcher.doc(scoreDoc.doc);
    }

    public void close() throws IOException {
        directoryReader.close();
        directory.close();
    }
}

5.5 LuceneTester.java

package com.cfs.lucene;

import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
    /*
     *basePath Need to be modified to your root dorectory 
     */
    public static final String basePath = "/home/qingdujun/Documents/LuceneRes/";
    public static final String indexDir = basePath+"Index";
    public static final String dataDir = basePath+"Data";

    private Indexer indexer = null;
    private Searcher searcher = null;
    /**
     * Test the function of create index
     * @throws IOException
     */
    private void createIndex() throws IOException {
        indexer = new Indexer(indexDir);
        int numIndexed = 0;
        long startTime = System.currentTimeMillis();
        numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
        long endTime = System.currentTimeMillis();
        indexer.close();
        System.out.println(numIndexed+" File indexed, time taken: "+(endTime-startTime)+" ms");
    }
    /**
     * Test the function of search result
     * @param searchQuery What you want to search for
     * @throws IOException
     * @throws ParseException
     */
    private void search(String searchQuery) throws IOException, ParseException {
        searcher = new Searcher(indexDir);
        long startTime = System.currentTimeMillis();
        TopDocs hits = searcher.search(searchQuery);
        long endTime = System.currentTimeMillis();
        System.out.println(hits.totalHits+" documents found. Time: "+(endTime-startTime)+" ms");
        for (ScoreDoc scoreDoc : hits.scoreDocs) {
            Document document = searcher.getDocument(scoreDoc);
            System.out.println("File: "+document.get(LuceneConstants.FILE_PATH));
        }
        searcher.close();
    }

    public static void main(String[] args) {
        LuceneTester luceneTester = new LuceneTester();
        try {
            luceneTester.createIndex();
            luceneTester.search("throws IOException");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

30分鐘，快速上手Lucene v7.4.0

1 Lucene環境相關

2 Add External JARs…(右擊專案->Build Path)

3 配置git

4 Lucene v7.4.0 source code

4.1 索引目錄

4.2 執行結果

5 附錄（可執行原始碼）：

5.1 TextFileFilter.java

5.2 LuceneConstants.java

5.3 Indexer.java

5.4 Searcher.java

5.5 LuceneTester.java

30分鐘，快速上手Lucene v7.4.0

學習Git---20分鐘git快速上手

Laravel入門及實踐，快速上手社交系統ThinkSNS+二次開發

JUnit 5 快速上手(從 JUnit 4 到 JUnit 5)

從零開始學 Web 之 Ajax（三）Ajax 概述，快速上手

UNIPAY.AI--系統商30分鐘即可快速接入聚合支付

不懂Git,別說自己是程式猿–20分鐘git快速上手

充分理解Kotlin，快速上手寫業務

如何把照片做成視訊？抖音爆款的圖片視訊切換教程，快速上手！

Solidity 語言簡易入門，快速上手！

30分鐘--Spark快速入門指南

RabbitMQ教程，快速上手

今天第一次面試別人，大概聊了近30分鐘，呵呵。

Fresco 的封裝，快速上手，影象後處理，超大圖高清預覽，縮小放大，雙擊放大等一一俱全

30分鐘，學會經典小遊戲編程!

OpenCV各版本差異與演化，從1.x到4.0

《linux裝置驅動開發》，基於最新的linux 4.0核心-----筆記

修改和編譯spring原始碼，構建jar(spring-context-4.0.2.RELEASE)

保姆級教程——Ubuntu16.04 Server下深度學習環境搭建：安裝CUDA8.0，cuDNN6.0，Bazel0.5.4，原始碼編譯安裝TensorFlow1.4.0(GPU版)

VC6.0使用教程-30分鐘玩轉Microsoft Visual C++ 6.0

30分鐘，快速上手Lucene v7.4.0

1 Lucene環境相關

2 Add External JARs…(右擊專案->Build Path)

3 配置git

4 Lucene v7.4.0 source code

4.1 索引目錄

4.2 執行結果

5 附錄（可執行原始碼）：

5.1 TextFileFilter.java

5.2 LuceneConstants.java

5.3 Indexer.java

5.4 Searcher.java

5.5 LuceneTester.java

相關推薦