1. 程式人生 > >Lucene使用單字分詞及短語查詢實現類似全模糊查詢效果

Lucene使用單字分詞及短語查詢實現類似全模糊查詢效果

      Lucene使用全模糊查詢效率慢,現通過單字分詞,及短語查詢的方式達到類似效果,並極大的提高效率。

      預期分詞效果:

中華人員共和國Chinese,Come On——>中/華/人/民/共/和/國/C/h/i/n/e/s/e/,/C/o/m/e/ /O/n

      缺點:索引檔案中存在大資料量的數字和英文時,用數字或英文查詢效率慢。

一、新建MyNGramAnalyzer類,實現單字分詞器
public final class MyNGramAnalyzer extends Analyzer {
   private Version version;
   public MyNGramAnalyzer(Version version) {
      this
.version = version; } protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
/*new NGramTokenizer(version, reader, minGram, maxGram)實現單字分詞,minGram:最小分詞數,maxGram:最大分詞數
* 這裡都用1,表示對每個字元都分詞
*/
return new TokenStreamComponents(new NGramTokenizer(version, reader, 1
, 1)); } }

二、使用MyNGramAnalyzer分詞器建立索引

public static void main(String[] args) {
try {
            //索引存放路徑Directory dir = FSDirectory.open(new File("d:/tool/index"));
            //lucene版本,這裡用的4.5Version version = Version.LUCENE_45;
            IndexWriterConfig iwc = new IndexWriterConfig(version,new 
MyNGramAnalyzer(version)); /* 索引的建立模式,CREATE:刪除原索引並新建; CREATE_OR_APPEND:如果原索引存在,就新增,不存在就新建 APPEND:在原索引上新增 */iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); IndexWriter writer = new IndexWriter(dir, iwc);
	    //一個Document就代表一條資料
            Document document1 = new Document();
            document1.add(new TextField("aaa","中華人員共和國Chinese,Come On", Field.Store.YES));
	    Document document2 = new Document();
	    document2.add(new TextField("bbb","中國Chinese", Field.Store.YES));
writer.addDocument(document1);
	    writer.addDocument(document2);
writer.commit();  
    writer.close();        
        } catch (IOException e) {            
    e.printStackTrace();        
}
}  
三、使用phraseQuery查詢索引
public static void main(String[] args) {
    //分詞查詢queryPhraseQuery pq = new PhraseQuery();
   //查詢條件String queryParam = "和國Chinese,";
    List<String> strList = new ArrayList<String>();
    //將查詢條件中的每個字放到PhraseQuery中for(int i=0;i<queryParam.length();i=i+1){
        String s = queryParam.substring(i,(i+1)>queryParam.length()?queryParam.length():(i+1));
        pq.add(new Term("aaa",s));
    }
    //開始查詢try {
        DirectoryReader directoryReader = DirectoryReader.open(FSDirectory.open(new File("d:/tool/index")));
        IndexSearcher indexSearcher  = new IndexSearcher(directoryReader);
        //第一引數為query,第二個為查詢條數TopDocs hits =  indexSearcher.search(pq,10);
        for (int i=0; i<hits.totalHits; i++) {
            Document doc = indexSearcher.doc(hits.scoreDocs[i].doc);
            System.out.println(doc.get("aaa"));
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}
輸出結果:中華人員共和國Chinese,Come On