【手把手教你全文檢索】Lucene索引的【增、刪、改、查】

阿新 • • 發佈：2018-12-30

前言

　　搞檢索的，應該多少都會了解Lucene一些，它開源而且簡單上手，官方API足夠編寫些小DEMO。並且根據倒排索引，實現快速檢索。本文就簡單的實現增量新增索引，刪除索引，通過關鍵字查詢，以及更新索引等操作。

　　目前博豬使用的不爽的地方就是，讀取檔案內容進行全文檢索時，需要自己編寫讀取過程（這個solr免費幫我們實現）。而且建立索引的過程比較慢，還有很大的優化空間，這個就要細心下來研究了。

　　建立索引

　　Lucene在進行建立索引時，根據前面一篇部落格，已經講完了大體的流程，這裡再簡單說下：

1 Directory directory = FSDirectory.open("/tmp/testindex");
 
2 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
3 IndexWriter iwriter = new IndexWriter(directory, config);
4 Document doc = new Document();
5 String text = "This is the text to be indexed.";
6 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();

　　1 建立Directory，獲取索引目錄

　　2 建立詞法分析器，建立IndexWriter物件

　　3 建立document物件，儲存資料

　　4 關閉IndexWriter，提交

 1 /**
 2      * 建立索引
 3      * 
 4      * @param args
 5      */
 6     public static void index() throws Exception {
 7         
 8         String text1 = "hello,man!";
 9         String text2 = "goodbye,man!";
 
10         String text3 = "hello,woman!";
11         String text4 = "goodbye,woman!";
12         
13         Date date1 = new Date();
14         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
15         directory = FSDirectory.open(new File(INDEX_DIR));
16 
17         IndexWriterConfig config = new IndexWriterConfig(
18                 Version.LUCENE_CURRENT, analyzer);
19         indexWriter = new IndexWriter(directory, config);
20 
21         Document doc1 = new Document();
22         doc1.add(new TextField("filename", "text1", Store.YES));
23         doc1.add(new TextField("content", text1, Store.YES));
24         indexWriter.addDocument(doc1);
25         
26         Document doc2 = new Document();
27         doc2.add(new TextField("filename", "text2", Store.YES));
28         doc2.add(new TextField("content", text2, Store.YES));
29         indexWriter.addDocument(doc2);
30         
31         Document doc3 = new Document();
32         doc3.add(new TextField("filename", "text3", Store.YES));
33         doc3.add(new TextField("content", text3, Store.YES));
34         indexWriter.addDocument(doc3);
35         
36         Document doc4 = new Document();
37         doc4.add(new TextField("filename", "text4", Store.YES));
38         doc4.add(new TextField("content", text4, Store.YES));
39         indexWriter.addDocument(doc4);
40         
41         indexWriter.commit();
42         indexWriter.close();
43 
44         Date date2 = new Date();
45         System.out.println("建立索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
46     }

　　增量新增索引

　　Lucene擁有增量新增索引的功能，在不會影響之前的索引情況下，新增索引，它會在何時的時機，自動合併索引檔案。

 1 /**
 2      * 增加索引
 3      * 
 4      * @throws Exception
 5      */
 6     public static void insert() throws Exception {
 7         String text5 = "hello,goodbye,man,woman";
 8         Date date1 = new Date();
 9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10         directory = FSDirectory.open(new File(INDEX_DIR));
11 
12         IndexWriterConfig config = new IndexWriterConfig(
13                 Version.LUCENE_CURRENT, analyzer);
14         indexWriter = new IndexWriter(directory, config);
15 
16         Document doc1 = new Document();
17         doc1.add(new TextField("filename", "text5", Store.YES));
18         doc1.add(new TextField("content", text5, Store.YES));
19         indexWriter.addDocument(doc1);
20 
21         indexWriter.commit();
22         indexWriter.close();
23 
24         Date date2 = new Date();
25         System.out.println("增加索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
26     }

　　刪除索引

　　Lucene也是通過IndexWriter呼叫它的delete方法，來刪除索引。我們可以通過關鍵字，刪除與這個關鍵字有關的所有內容。如果僅僅是想要刪除一個文件，那麼最好就頂一個唯一的ID域，通過這個ID域，來進行刪除操作。

 1 /**
 2      * 刪除索引
 3      * 
 4      * @param str 刪除的關鍵字
 5      * @throws Exception
 6      */
 7     public static void delete(String str) throws Exception {
 8         Date date1 = new Date();
 9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10         directory = FSDirectory.open(new File(INDEX_DIR));
11 
12         IndexWriterConfig config = new IndexWriterConfig(
13                 Version.LUCENE_CURRENT, analyzer);
14         indexWriter = new IndexWriter(directory, config);
15         
16         indexWriter.deleteDocuments(new Term("filename",str));  
17         
18         indexWriter.close();
19         
20         Date date2 = new Date();
21         System.out.println("刪除索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
22     }

　　更新索引

　　Lucene沒有真正的更新操作，通過某個fieldname，可以更新這個域對應的索引，但是實質上，它是先刪除索引，再重新建立的。

 1 /**
 2      * 更新索引
 3      * 
 4      * @throws Exception
 5      */
 6     public static void update() throws Exception {
 7         String text1 = "update,hello,man!";
 8         Date date1 = new Date();
 9          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10          directory = FSDirectory.open(new File(INDEX_DIR));
11 
12          IndexWriterConfig config = new IndexWriterConfig(
13                  Version.LUCENE_CURRENT, analyzer);
14          indexWriter = new IndexWriter(directory, config);
15          
16          Document doc1 = new Document();
17         doc1.add(new TextField("filename", "text1", Store.YES));
18         doc1.add(new TextField("content", text1, Store.YES));
19         
20         indexWriter.updateDocument(new Term("filename","text1"), doc1);
21         
22          indexWriter.close();
23          
24          Date date2 = new Date();
25          System.out.println("更新索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
26     }

　　通過索引查詢關鍵字

　　Lucene的查詢方式有很多種，這裡就不做詳細介紹了。它會返回一個ScoreDoc的集合，類似ResultSet的集合，我們可以通過域名獲取想要獲取的內容。

 1 /**
 2      * 關鍵字查詢
 3      * 
 4      * @param str
 5      * @throws Exception
 6      */
 7     public static void search(String str) throws Exception {
 8         directory = FSDirectory.open(new File(INDEX_DIR));
 9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10         DirectoryReader ireader = DirectoryReader.open(directory);
11         IndexSearcher isearcher = new IndexSearcher(ireader);
12 
13         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);
14         Query query = parser.parse(str);
15 
16         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
17         for (int i = 0; i < hits.length; i++) {
18             Document hitDoc = isearcher.doc(hits[i].doc);
19             System.out.println(hitDoc.get("filename"));
20             System.out.println(hitDoc.get("content"));
21         }
22         ireader.close();
23         directory.close();
24     }

　　全部程式碼

  1 package test;
  2 
  3 import java.io.File;
  4 import java.util.Date;
  5 import java.util.List;
  6 
  7 import org.apache.lucene.analysis.Analyzer;
  8 import org.apache.lucene.analysis.standard.StandardAnalyzer;
  9 import org.apache.lucene.document.Document;
 10 import org.apache.lucene.document.LongField;
 11 import org.apache.lucene.document.TextField;
 12 import org.apache.lucene.document.Field.Store;
 13 import org.apache.lucene.index.DirectoryReader;
 14 import org.apache.lucene.index.IndexWriter;
 15 import org.apache.lucene.index.IndexWriterConfig;
 16 import org.apache.lucene.index.Term;
 17 import org.apache.lucene.queryparser.classic.QueryParser;
 18 import org.apache.lucene.search.IndexSearcher;
 19 import org.apache.lucene.search.Query;
 20 import org.apache.lucene.search.ScoreDoc;
 21 import org.apache.lucene.store.Directory;
 22 import org.apache.lucene.store.FSDirectory;
 23 import org.apache.lucene.util.Version;
 24 
 25 public class TestLucene {
 26     // 儲存路徑
 27     private static String INDEX_DIR = "D:\\luceneIndex";
 28     private static Analyzer analyzer = null;
 29     private static Directory directory = null;
 30     private static IndexWriter indexWriter = null;
 31 
 32     public static void main(String[] args) {
 33         try {
 34 //            index();
 35             search("man");
 36 //            insert();
 37 //            delete("text5");
 38 //            update();
 39         } catch (Exception e) {
 40             e.printStackTrace();
 41         }
 42     }
 43     /**
 44      * 更新索引
 45      * 
 46      * @throws Exception
 47      */
 48     public static void update() throws Exception {
 49         String text1 = "update,hello,man!";
 50         Date date1 = new Date();
 51          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
 52          directory = FSDirectory.open(new File(INDEX_DIR));
 53 
 54          IndexWriterConfig config = new IndexWriterConfig(
 55                  Version.LUCENE_CURRENT, analyzer);
 56          indexWriter = new IndexWriter(directory, config);
 57          
 58          Document doc1 = new Document();
 59         doc1.add(new TextField("filename", "text1", Store.YES));
 60         doc1.add(new TextField("content", text1, Store.YES));
 61         
 62         indexWriter.updateDocument(new Term("filename","text1"), doc1);
 63         
 64          indexWriter.close();
 65          
 66          Date date2 = new Date();
 67          System.out.println("更新索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
 68     }
 69     /**
 70      * 刪除索引
 71      * 
 72      * @param str 刪除的關鍵字
 73      * @throws Exception
 74      */
 75     public static void delete(String str) throws Exception {
 76         Date date1 = new Date();
 77         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
 78         directory = FSDirectory.open(new File(INDEX_DIR));
 79 
 80         IndexWriterConfig config = new IndexWriterConfig(
 81                 Version.LUCENE_CURRENT, analyzer);
 82         indexWriter = new IndexWriter(directory, config);
 83         
 84         indexWriter.deleteDocuments(new Term("filename",str));  
 85         
 86         indexWriter.close();
 87         
 88         Date date2 = new Date();
 89         System.out.println("刪除索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
 90     }
 91     /**
 92      * 增加索引
 93      * 
 94      * @throws Exception
 95      */
 96     public static void insert() throws Exception {
 97         String text5 = "hello,goodbye,man,woman";
 98         Date date1 = new Date();
 99         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
100         directory = FSDirectory.open(new File(INDEX_DIR));
101 
102         IndexWriterConfig config = new IndexWriterConfig(
103                 Version.LUCENE_CURRENT, analyzer);
104         indexWriter = new IndexWriter(directory, config);
105 
106         Document doc1 = new Document();
107         doc1.add(new TextField("filename", "text5", Store.YES));
108         doc1.add(new TextField("content", text5, Store.YES));
109         indexWriter.addDocument(doc1);
110 
111         indexWriter.commit();
112         indexWriter.close();
113 
114         Date date2 = new Date();
115         System.out.println("增加索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
116     }
117     /**
118      * 建立索引
119      * 
120      * @param args
121      */
122     public static void index() throws Exception {
123         
124         String text1 = "hello,man!";
125         String text2 = "goodbye,man!";
126         String text3 = "hello,woman!";
127         String text4 = "goodbye,woman!";
128         
129         Date date1 = new Date();
130         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
131         directory = FSDirectory.open(new File(INDEX_DIR));
132 
133         IndexWriterConfig config = new IndexWriterConfig(
134                 Version.LUCENE_CURRENT, analyzer);
135         indexWriter = new IndexWriter(directory, config);
136 
137         Document doc1 = new Document();
138         doc1.add(new TextField("filename", "text1", Store.YES));
139         doc1.add(new TextField("content", text1, Store.YES));
140         indexWriter.addDocument(doc1);
141         
142         Document doc2 = new Document();
143         doc2.add(new TextField("filename", "text2", Store.YES));
144         doc2.add(new TextField("content", text2, Store.YES));
145         indexWriter.addDocument(doc2);
146         
147         Document doc3 = new Document();
148         doc3.add(new TextField("filename", "text3", Store.YES));
149         doc3.add(new TextField("content", text3, Store.YES));
150         indexWriter.addDocument(doc3);
151         
152         Document doc4 = new Document();
153         doc4.add(new TextField("filename", "text4", Store.YES));
154         doc4.add(new TextField("content", text4, Store.YES));
155         indexWriter.addDocument(doc4);
156         
157         indexWriter.commit();
158         indexWriter.close();
159 
160         Date date2 = new Date();
161         System.out.println("建立索引耗時：" + (date2.getTime() - date1.getTime()) + "ms\n");
162     }
163 
164     /**
165      * 關鍵字查詢
166      * 
167      * @param str
168      * @throws Exception
169      */
170     public static void search(String str) throws Exception {
171         directory = FSDirectory.open(new File(INDEX_DIR));
172         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
173         DirectoryReader ireader = DirectoryReader.open(directory);
174         IndexSearcher isearcher = new IndexSearcher(ireader);
175 
176         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);
177         Query query = parser.parse(str);
178 
179         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
180         for (int i = 0; i < hits.length; i++) {
181             Document hitDoc = isearcher.doc(hits[i].doc);
182             System.out.println(hitDoc.get("filename"));
183             System.out.println(hitDoc.get("content"));
184         }
185         ireader.close();
186         directory.close();
187     }
188 }

View Code

　　參考資料

　　http://www.cnblogs.com/xing901022/p/3933675.html

【手把手教你全文檢索】Lucene索引的【增、刪、改、查】

前言　　搞檢索的，應該多少都會了解Lucene一些，它開源而且簡單上手，官方API足夠編寫些小DEMO。並且根據倒排索引，實現快速檢索。本文就簡單的實現增量新增索引，刪除索引，通過關鍵字查詢，以及更新索引等操作。　　目前博豬使用的不爽的地方就是，讀取檔案內容進行全文檢索時，需要自己編寫讀取過程（這

【手把手教你全文檢索】Apache Lucene初探

PS: 苦學一週全文檢索，由原來的搜尋小白，到初次涉獵，感覺每門技術都博大精深，其中精髓亦是不可一日而語。那小博豬就簡單介紹一下這一週的學習歷程，僅供各位程式猿們參考，這其中不涉及任何私密話題，因此也不用打馬賽克了，都是網路分享的開源資料，當然也不涉及任何利益關係。　　如若轉載，還請註明出處——

【手把手教你】Python金融財務分析

內容來自：微信公眾號：python金融量化關注可瞭解更多的金融與Python乾貨。內容目錄貨幣時間價值年金計算實際利率專案投資分析單利與複利增長關於CuteHand 1. 貨幣時間價值實際上numpy和scipy很強大，

【手把手教你】Python獲取財經資料和視覺化分析

內容來自：微信公眾號：python金融量化關注可瞭解更多的金融與Python乾貨。 “巧婦難為無米之炊”，找不到資料，量化分析也就無從談起。對於金融分析者來說，獲取資料是量化分析的第一步。Python的一個強大功能之一就是資料獲取（爬蟲）。但是對於沒時間學爬蟲程式的小白來說，pytho

【手把手教你樹莓派3 （一）】裝機

概述 raspberry pi其實可以看做一個微型的計算機，我們可以在上面裝各種作業系統，然後搭建伺服器，當然這只是它的一小點功能罷了。。。與我們常用的PC機不同的是，ras pi有GPIO，我們可以讓raspberry pi來控制這些引腳，從而傳送一些物理訊號給其他的裝置

【手把手教你】玩轉Python金融量化利器之Pandas

前言“手把手教你”系列將為Python初學者一一介紹Python在量化金融中運用最廣泛的幾個庫（Library）: NumPy（陣列、線性代數）、SciPy（統計）、pandas（時間序列、資料分析）、matplotlib（視覺化分析）。建議安裝Anaconda軟體（自帶上述常見庫），並使用Jupyter N

Java【手把手教你整合最簡潔的SSM框架：SpringMVC + Spring + MyBatis】

手把手教你整合最簡潔的SSM框架：SpringMVC + Spring + MyBatis 1.介紹 ~ Project introduction 專案整體使用SSM框架：SpringMVC +

【騰訊bugly幹貨分享】精神哥手把手教你怎樣智鬥ANR

waiting pen nag 技術分享 input 這就是 max-width 卡死 gravity 上帝說要有ANR，於是Bugly就有了ANR上報。那麽ANR究竟是什麽？近期非常多童鞋問起精神哥ANR的問題，那麽這次就來聊一下，雞爪怎麽泡才好吃。噢不，是怎

【技術分享】手把手教你使用PowerShell內置的端口掃描器

別名 target 實例 white tcpclient 提升是否 ddb sans 【技術分享】手把手教你使用PowerShell內置的端口掃描器引言想做端口掃描，NMAP是理想的選擇，但是有時候NMAP並不可用。有的時候僅僅是想看一下某個端口是否開放。

【實戰篇】手把手教你接LSI9211-8I卡

lsi9211 8i卡接法小編馬上手把手教你一次接8個硬盤，為批處理修復硬盤做好準備。先讓大家看一下整體接法一覽圖。白口接電源，黑口接SAS~~輕松簡易視頻體積稍大，建議WIFI環境下觀看http://www.hddup.com/CustomMovie/LSI9211-8I%E5%8D%A1%E6%

【後臺測試】手把手教你jmeter壓測

異常等於 spl 分別是觀察路徑 string gre block 我知道我遲早是要踏上了後臺測試之路的，只是沒想到來的這麽突然。新接手了一個項目，在第一版發出後，產品需要做運營活動拉量，因為我擔心突然的流量湧入是否會對後臺造成壓力呢？因此決定做一下壓測：　　下面就

手把手教你製作GIF動態圖片【GIF教程】

GIF動態圖片的原理就是在一段時間內顯示一系列圖片或者是幀，每一張圖或者幀都較前面那一張圖有些許的變化，當變化速度達到一定程度就產生這些圖片或者幀動起來的錯覺。小編今天所講的教程就是教大家如何錄製視訊，並將視訊分解成幀，從而再將其串起來，製作成GIF動態圖片的。這需要用到一款名為迅捷GIF製

【Redis】手把手教你Windows中redis的下載，安裝，設定及啟動

文章目錄 1、系統環境 2、Redis下載 3、Redis在Windows中安裝 4、Redis啟動 5、環境變數設定

【AI實戰】手把手教你訓練自己的目標檢測模型（SSD篇）

目標檢測是AI的一項重要應用，通過目標檢測模型能在影象中把人、動物、汽車、飛機等目標物體檢測出來，甚至還能將物體的輪廓描繪出來，就像下面這張圖，是不是很酷炫呢，嘿嘿在動手訓練自己的目標檢測模型之前，建議先了解一下目標檢測模型的原理（見文章：大話目標檢測經典模型RCNN、Fast RCN

【Python量化】手把手教你用python做股票分析入門

內容來自：微信公眾號：python金融量化關注可瞭解更多的金融與Python乾貨。目前，獲取股票資料的渠道有很多，而且基本上是免費的，比如，行情軟體有同花順、東方財富等，入口網站有新浪財經、騰訊財經、和訊網等。Python也有不少免費的開源api可以獲取交易行情資料，如pandas自

【PyCharm】手把手教你在Windows上安裝PyCharm--詳細介紹【多圖，手機流量慎入】

0 系統環境 Windows 10 PyCharm 2018.2.4 1 下載軟體 Q：如何看待某些人下載軟體喜歡到官網的偏好？ A：同學，你沒有中國百度全家桶吧？ ----摘自知乎官網下載軟體，請按圖索驥。 1.1 搜尋官網 1.2 準備下載

【Kaldi 新手入門】手把手教你搭建簡易英文數字ASR系統

* 寫作本文的目的：一方面是為了幫助Kaldi的新手更好的入門這個語音識別工具，另一方面是為自己的學習做一個筆記，也方便日後的學習查閱. * Kaldi的下載安裝備註：雖然Kaldi可以同時執行在Windows和Linux兩個平臺上，但大多數

【原】iOS：手把手教你釋出程式碼到CocoaPods(Trunk方式)

Change Log: 2015.08.20 - 新增podspec檔案更新方法 2015.08.19 - 首次釋出概述關於CocoaPods的介紹不在本文的主題範圍內，如果你是iOS開發者卻不知道CocoaPods，那可能要面壁30秒了。直奔主題，這篇文章主要介紹如果把你的程式碼釋出到Cocoa

【專欄】- 手把手教你如何用nginx開發自己的伺服器

手把手教你如何用nginx開發自己的伺服器博主希望能夠通過自己的nginx學習經驗，給希望學習nginx的同學一些參考的經驗，這個專欄的目的是教會完全不懂nginx的同學如何利用nginx去開發自己的web伺服器或者代理伺服器

手把手教你開發基於深度學習的人臉識別【考勤/簽到】系統

人臉識別介紹人臉識別技術是一項非接觸式、使用者友好、非配合型的計算機視覺識別技術。隨著機器學習、深度學習等技術的發展，人臉識別的應用正日趨完善和成熟。本文將介紹人臉識別技術如何用於考勤/簽到系統。本文將主要從以下幾個方面闡述：平臺環境需求涉及的技術點人臉識

【手把手教你全文檢索】Lucene索引的【增、刪、改、查】

建立索引

增量新增索引

刪除索引

更新索引

通過索引查詢關鍵字

全部程式碼

參考資料

相關推薦

　　建立索引

　　增量新增索引

　　刪除索引

　　更新索引

　　通過索引查詢關鍵字

　　全部程式碼

　　參考資料