1. 程式人生 > >Lucene學習總結之七:Lucene搜尋過程解析(3)

Lucene學習總結之七:Lucene搜尋過程解析(3)

2.3、QueryParser解析查詢語句生成查詢物件

程式碼為:

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "contents", new StandardAnalyzer(Version.LUCENE_CURRENT));

Query query = parser.parse("+(+apple* -boy) (cat* dog) -(eat~ foods)");

此過程相對複雜,涉及JavaCC,QueryParser,分詞器,查詢語法等,本章不會詳細論述,會在後面的章節中一一說明。

此處唯一要說明的是,根據查詢語句生成的是一個Query樹,這棵樹很重要,並且會生成其他的樹,一直貫穿整個索引過程。

query    BooleanQuery  (id=96)   
  |  boost    1.0   
  |  clauses    ArrayList<E>  (id=98)   
  |      elementData    Object[10]  (id=100)   
  |------[0]    BooleanClause  (id=102)   
  |          |   occur    BooleanClause$Occur$1  (id=106)   
  |          |        name    "MUST" //AND  
  |          |        ordinal    0   
  |          |---query    BooleanQuery

  (id=108)   
  |                  |   boost    1.0   
  |                  |   clauses    ArrayList<E>  (id=112)   
  |                  |      elementData    Object[10]  (id=113)   
  |                  |------[0]    BooleanClause  (id=114)   
  |                  |          |   occur    BooleanClause$Occur$1  (id=106)   
  |                  |          |      name    "MUST"
   //AND 
  |                  |          |      ordinal    0   
  |                  |          |--query    PrefixQuery  (id=116)   
  |                  |                 boost    1.0   
  |                  |                 numberOfTerms    0   
  |                  |                 prefix    Term  (id=117)   
 
|                  |                     field    "contents"    
  |                  |                     text    "apple"   
  |                  |                 rewriteMethod    MultiTermQuery$1  (id=119)   
  |                  |                     docCountPercent    0.1   
  |                  |                     termCountCutoff    350   
  |                  |------[1]    BooleanClause  (id=115)    
  |                             |   occur    BooleanClause$Occur$3  (id=123)   
  |                             |       name    "MUST_NOT"   //NOT
  |                             |       ordinal    2   
  |                             |--query    TermQuery  (id=125)   
  |                                    boost    1.0   
  |                                    term    Term  (id=127)   
  |                                        field    "contents"   
  |                                        text    "boy"    
  |                      size    2   
  |                  disableCoord    false   
  |                  minNrShouldMatch    0   
  |------[1]    BooleanClause  (id=104)   
  |          |   occur    BooleanClause$Occur$2  (id=129)   
  |          |        name    "SHOULD"  //OR
  |          |        ordinal    1   
  |          |---query    BooleanQuery  (id=131)   
  |                  |   boost    1.0   
  |                  |   clauses    ArrayList<E>  (id=133)   
  |                  |      elementData    Object[10]  (id=134)   
  |                  |------[0]    BooleanClause  (id=135)   
  |                  |          |  occur    BooleanClause$Occur$2  (id=129)   
  |                  |          |      name    "SHOULD"  //OR  
  |                  |          |      ordinal    1   
  |                  |          |--query    PrefixQuery  (id=137)   
  |                  |                 boost    1.0   
  |                  |                 numberOfTerms    0   
 |                  |                 prefix    Term  (id=138)   
  |                  |                     field    "contents"   
  |                  |                     text    "cat"   
  |                  |                 rewriteMethod    MultiTermQuery$1  (id=119)   
  |                  |                     docCountPercent    0.1   
  |                  |                     termCountCutoff    350   
  |                  |------[1]    BooleanClause  (id=136)   
  |                             |  occur    BooleanClause$Occur$2  (id=129)   
  |                             |      name    "SHOULD"  //OR   
  |                             |      ordinal    1   
  |                             |--query    TermQuery  (id=140)   
  |                                   boost    1.0   
|                                   term    Term  (id=141)   
 
|                                       field    "contents"   
 
|                                       text    "dog"    
  |                      size    2   
  |                  disableCoord    false   
  |                  minNrShouldMatch    0   
  |------[2]    BooleanClause  (id=105)   
             |   occur    BooleanClause$Occur$3  (id=123)   
             |       name    "MUST_NOT"   //NOT
             |       ordinal    2   
             |---query    BooleanQuery  (id=143)   
                     |   boost    1.0   
                     |   clauses    ArrayList<E>  (id=146)   
                     |     elementData    Object[10]  (id=147)   
                     |------[0]    BooleanClause  (id=148)   
                     |          |    occur    BooleanClause$Occur$2  (id=129)   
                     |          |       name    "SHOULD"   //OR
                     |          |       ordinal    1   
                     |          |--query    FuzzyQuery  (id=150)   
                     |                boost    1.0   
                     |                minimumSimilarity    0.5   
                     |                numberOfTerms    0   
                     |                prefixLength    0   
                     |                rewriteMethod MultiTermQuery$ScoringBooleanQueryRewrite  (id=152)   
                     |                term    Term  (id=153)   
                     |                   field    "contents"   
                     |                   text    "eat"   
                     |                termLongEnough    true   
                     |------[1]    BooleanClause  (id=149)    
                                |    occur    BooleanClause$Occur$2  (id=129)   
                                |       name    "SHOULD"  //OR 
                                |       ordinal    1   
                                |--query    TermQuery  (id=155)   
                                      boost    1.0   
                                      term    Term  (id=156)   
                                          field    "contents"   
                                          text    "foods"
    
                        size    2   
                    disableCoord    false   
                    minNrShouldMatch    0    
        size    3   
    disableCoord    false   
    minNrShouldMatch    0   

image_thumb4

對於Query物件有以下說明:

  • BooleanQuery即所有的子語句按照布林關係合併
    • +也即MUST表示必須滿足的語句
    • SHOULD表示可以滿足的,minNrShouldMatch表示在SHOULD中必須滿足的最小語句個數,預設是0,也即既然是SHOULD,也即或的關係,可以一個也不滿足(當然沒有MUST的時候除外)。
    • -也即MUST_NOT表示必須不能滿足的語句
  • 樹的葉子節點中:
    • 最基本的是TermQuery,也即表示一個詞
    • 當然也可以是PrefixQuery和FuzzyQuery,這些查詢語句由於特殊的語法,可能對應的不是一個詞,而是多個詞,因而他們都有rewriteMethod物件指向MultiTermQuery的Inner Class,表示對應多個詞,在查詢過程中會得到特殊處理。

2.4、搜尋查詢物件

程式碼為:

TopDocs docs = searcher.search(query, 50);

其最終呼叫search(createWeight(query), filter, n);

索引過程包含以下子過程:

  • 建立weight樹,計算term weight
  • 建立scorer及SumScorer樹,為合併倒排表做準備
  • 用SumScorer進行倒排表合併
  • 收集文件結果集合及計算打分

2.4.1、建立Weight物件樹,計算Term Weight

IndexSearcher(Searcher).createWeight(Query) 程式碼如下:

protected Weight createWeight(Query query) throws IOException {

  return query.weight(this);

}

BooleanQuery(Query).weight(Searcher) 程式碼為:

public Weight weight(Searcher searcher) throws IOException {

  //重寫Query物件樹

  Query query = searcher.rewrite(this);

  //建立Weight物件樹

  Weight weight = query.createWeight(searcher);

  //計算Term Weight分數

  float sum = weight.sumOfSquaredWeights();

  float norm = getSimilarity(searcher).queryNorm(sum);

  weight.normalize(norm);

  return weight;

}

此過程又包含以下過程:

  • 重寫Query物件樹
  • 建立Weight物件樹
  • 計算Term Weight分數
2.4.1.1、重寫Query物件樹

從BooleanQuery的rewrite函式我們可以看出,重寫過程也是一個遞迴的過程,一直到Query物件樹的葉子節點。

BooleanQuery.rewrite(IndexReader) 程式碼如下:

BooleanQuery clone = null;

for (int i = 0 ; i < clauses.size(); i++) {

  BooleanClause c = clauses.get(i);

  //對每一個子語句的Query物件進行重寫

  Query query = c.getQuery().rewrite(reader);

  if (query != c.getQuery()) {

    if (clone == null)

      clone = (BooleanQuery)this.clone();

    //重寫後的Query物件加入複製的新Query物件樹

    clone.clauses.set(i, new BooleanClause(query, c.getOccur()));

  }

}

if (clone != null) {

  return clone; //如果有子語句被重寫,則返回複製的新Query物件樹。

} else

  return this; //否則將老的Query物件樹返回。

讓我們把目光聚集到葉子節點上,葉子節點基本是兩種,或是TermQuery,或是MultiTermQuery,從Lucene的原始碼可以看出TermQuery的rewrite函式就是返回物件本身,也即真正需要重寫的是MultiTermQuery,也即一個Query代表多個Term參與查詢,如本例子中的PrefixQuery及FuzzyQuery。

對此類的Query,Lucene不能夠直接進行查詢,必須進行重寫處理:

  • 首先,要從索引檔案的詞典中,把多個Term都找出來,比如"appl*",我們在索引檔案的詞典中可以找到如下Term:"apple","apples","apply",這些Term都要參與查詢過程,而非原來的"appl*"參與查詢過程,因為詞典中根本就沒有"appl*"。
  • 然後,將取出的多個Term重新組織成新的Query物件進行查詢,基本有兩種方式:
    • 方式一:將多個Term看成一個Term,將包含它們的文件號取出來放在一起(DocId Set),作為一個統一的倒排表來參與倒排表的合併。
    • 方式二:將多個Term組成一個BooleanQuery,它們之間是OR的關係。

從上面的Query物件樹中,我們可以看到,MultiTermQuery都有一個RewriteMethod成員變數,就是用來重寫Query物件的,有以下幾種:

  • ConstantScoreFilterRewrite採取的是方式一,其rewrite函式實現如下:

public Query rewrite(IndexReader reader, MultiTermQuery query) {

  Query result = new ConstantScoreQuery(new MultiTermQueryWrapperFilter<MultiTermQuery>(query));

  result.setBoost(query.getBoost());

  return result;

}

MultiTermQueryWrapperFilter中的getDocIdSet函式實現如下:

public DocIdSet getDocIdSet(IndexReader reader) throws IOException {

  //得到MultiTermQuery的Term列舉器

  final TermEnum enumerator = query.getEnum(reader);

  try {

    if (enumerator.term() == null)

      return DocIdSet.EMPTY_DOCIDSET;

    //建立包含多個Term的文件號集合

    final OpenBitSet bitSet = new OpenBitSet(reader.maxDoc());

    final int[] docs = new int[32];

    final int[] freqs = new int[32];

    TermDocs termDocs = reader.termDocs();

    try {

      int termCount = 0;

      //一個迴圈,取出對應MultiTermQuery的所有的Term,取出他們的文件號,加入集合

      do {

        Term term = enumerator.term();

        if (term == null)

          break;

        termCount++;

        termDocs.seek(term);

        while (true) {

          final int count = termDocs.read(docs, freqs);

          if (count != 0) {

            for(int i=0;i<count;i++) {

              bitSet.set(docs[i]);

            }

          } else {

            break;

          }

        }

      } while (enumerator.next());

      query.incTotalNumberOfTerms(termCount);

    } finally {

      termDocs.close();

    }

    return bitSet;

  } finally {

    enumerator.close();

  }

}

  • ScoringBooleanQueryRewrite及其子類ConstantScoreBooleanQueryRewrite採取方式二,其rewrite函式程式碼如下:

public Query rewrite(IndexReader reader, MultiTermQuery query) throws IOException {

  //得到MultiTermQuery的Term列舉器

  FilteredTermEnum enumerator = query.getEnum(reader);

  BooleanQuery result = new BooleanQuery(true);

  int count = 0;

  try {

      //一個迴圈,取出對應MultiTermQuery的所有的Term,加入BooleanQuery

    do {

      Term t = enumerator.term();

      if (t != null) {

        TermQuery tq = new TermQuery(t);

        tq.setBoost(query.getBoost() * enumerator.difference());

        result.add(tq, BooleanClause.Occur.SHOULD);

        count++;

      }

    } while (enumerator.next());   

  } finally {

    enumerator.close();

  }

  query.incTotalNumberOfTerms(count);

  return result;

}

  • 以上兩種方式各有優劣:
    • 方式一使得MultiTermQuery對應的所有的Term看成一個Term,組成一個docid set,作為統一的倒排表參與倒排表的合併,這樣無論這樣的Term在索引中有多少,都只會有一個倒排表參與合併,不會產生TooManyClauses異常,也使得效能得到提高。但是多個Term之間的tf, idf等差別將被忽略,所以採用方式二的RewriteMethod為ConstantScoreXXX,也即除了使用者指定的Query boost,其他的打分計算全部忽略。
    • 方式二使得整個Query物件樹被展開,葉子節點都為TermQuery,MultiTermQuery中的多個Term可根據在索引中的tf, idf等參與打分計算,然而我們事先並不知道索引中和MultiTermQuery相對應的Term到底有多少個,因而會出現TooManyClauses異常,也即一個BooleanQuery中的子查詢太多。這樣會造成要合併的倒排表非常多,從而影響效能。
    • Lucene認為對於MultiTermQuery這種查詢,打分計算忽略是很合理的,因為當用戶輸入"appl*"的時候,他並不知道索引中有什麼與此相關,也並不偏愛其中之一,因而計算這些詞之間的差別對使用者來講是沒有意義的。從而Lucene對方式二也提供了ConstantScoreXXX,來提高搜尋過程的效能,從後面的例子來看,會影響文件打分,在實際的系統應用中,還是存在問題的。
    • 為了兼顧上述兩種方式,Lucene提供了ConstantScoreAutoRewrite,來根據不同的情況,選擇不同的方式。

ConstantScoreAutoRewrite.rewrite程式碼如下:

public Query rewrite(IndexReader reader, MultiTermQuery query) throws IOException {

  final Collection<Term> pendingTerms = new ArrayList<Term>();

  //計算文件數目限制,docCountPercent預設為0.1,也即索引文件總數的0.1%

  final int docCountCutoff = (int) ((docCountPercent / 100.) * reader.maxDoc());

  //計算Term數目限制,預設為350

  final int termCountLimit = Math.min(BooleanQuery.getMaxClauseCount(), termCountCutoff);

  int docVisitCount = 0;

  FilteredTermEnum enumerator = query.getEnum(reader);

  try {

    //一個迴圈,取出與MultiTermQuery相關的所有的Term。

    while(true) {

      Term t = enumerator.term();

      if (t != null) {

        pendingTerms.add(t);

        docVisitCount += reader.docFreq(t);

      }

      //如果Term數目超限,或者文件數目超限,則可能非常影響倒排表合併的效能,因而選用方式一,也即ConstantScoreFilterRewrite的方式

      if (pendingTerms.size() >= termCountLimit || docVisitCount >= docCountCutoff) {

        Query result = new ConstantScoreQuery(new MultiTermQueryWrapperFilter<MultiTermQuery>(query));

        result.setBoost(query.getBoost());

        return result;

      } else  if (!enumerator.next()) {

        //如果Term數目不太多,而且文件數目也不太多,不會影響倒排表合併的效能,因而選用方式二,也即ConstantScoreBooleanQueryRewrite的方式。

        BooleanQuery bq = new BooleanQuery(true);

        for (final Term term: pendingTerms) {

          TermQuery tq = new TermQuery(term);

          bq.add(tq, BooleanClause.Occur.SHOULD);

        }

        Query result = new ConstantScoreQuery(new QueryWrapperFilter(bq));

        result.setBoost(query.getBoost());

        query.incTotalNumberOfTerms(pendingTerms.size());

        return result;

      }

    }

  } finally {

    enumerator.close();

  }

}

從上面的敘述中,我們知道,在重寫Query物件樹的時候,從MultiTermQuery得到的TermEnum很重要,能夠得到對應MultiTermQuery的所有的Term,這是怎麼做的的呢?

MultiTermQuery的getEnum返回的是FilteredTermEnum,它有兩個成員變數,其中TermEnum actualEnum是用來列舉索引中所有的Term的,而Term currentTerm指向的是當前滿足條件的Term,FilteredTermEnum的next()函式如下:

public boolean next() throws IOException {

    if (actualEnum == null) return false;

    currentTerm = null;

    //不斷得到下一個索引中的Term

    while (currentTerm == null) {

        if (endEnum()) return false;

        if (actualEnum.next()) {

            Term term = actualEnum.term();

             //如果當前索引中的Term滿足條件,則賦值為當前的Term

            if (termCompare(term)) {

                currentTerm = term;

                return true;

            }

        }

        else return false;

    }

    currentTerm = null;

    return false;

}

不同的MultiTermQuery的termCompare不同:

  • 對於PrefixQuery的getEnum(IndexReader reader)得到的是PrefixTermEnum,其termCompare實現如下:

protected boolean termCompare(Term term) {

  //只要字首相同,就滿足條件

  if (term.field() == prefix.field() && term.text().startsWith(prefix.text())){                                                                             

    return true;

  }

  endEnum = true;

  return false;

}

  • 對於FuzzyQuery的getEnum得到的是FuzzyTermEnum,其termCompare實現如下:

protected final boolean termCompare(Term term) {

  //對於FuzzyQuery,其prefix設為空"",也即這一條件一定滿足,只要計算的是similarity

  if (field == term.field() && term.text().startsWith(prefix)) {

      final String target = term.text().substring(prefix.length());

      this.similarity = similarity(target);

      return (similarity > minimumSimilarity);

  }

  endEnum = true;

  return false;

}

//計算Levenshtein distance 也即 edit distance,對於兩個字串,從一個轉換成為另一個所需要的最少基本操作(新增,刪除,替換)數。

private synchronized final float similarity(final String target) {

    final int m = target.length();

    final int n = text.length();

    // init matrix d

    for (int i = 0; i<=n; ++i) {

      p[i] = i;

    }

    // start computing edit distance

    for (int j = 1; j<=m; ++j) { // iterates through target

      int bestPossibleEditDistance = m;

      final char t_j = target.charAt(j-1); // jth character of t

      d[0] = j;

      for (int i=1; i<=n; ++i) { // iterates through text

        // minimum of cell to the left+1, to the top+1, diagonally left and up +(0|1)

        if (t_j != text.charAt(i-1)) {

          d[i] = Math.min(Math.min(d[i-1], p[i]),  p[i-1]) + 1;

        } else {

          d[i] = Math.min(Math.min(d[i-1]+1, p[i]+1),  p[i-1]);

        }

        bestPossibleEditDistance = Math.min(bestPossibleEditDistance, d[i]);

      }

      // copy current distance counts to 'previous row' distance counts: swap p and d

      int _d[] = p;

      p = d;

      d = _d;

    }

    return 1.0f - ((float)p[n] / (float) (Math.min(n, m)));

  }

計算兩個字串s和t的edit distance演算法如下:

Step 1:
Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and 0..n columns.

Step 2:
Initialize the first row to 0..n.
Initialize the first column to 0..m.

Step 3:
Examine each character of s (i from 1 to n).

Step 4:
Examine each character of t (j from 1 to m).

Step 5:
If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.

Step 6:
Set cell d[i,j] of the matrix equal to the minimum of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.
c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.

Step 7:
After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].

舉例說明其過程如下:

比較的兩個字串為:“GUMBO” 和 "GAMBOL".

editdistance_thumb8

下面做一個試驗,來說明ConstantScoreXXX對評分的影響:

在索引中,添加了以下四篇文件:

file01.txt : apple other other other other

file02.txt : apple apple other other other

file03.txt : apple apple apple other other

file04.txt : apple apple apple other other

搜尋"apple"結果如下:

docid : 3 score : 0.67974937
docid : 2 score : 0.58868027
docid : 1 score : 0.4806554
docid : 0 score : 0.33987468

文件按照包含"apple"的多少排序。

而搜尋"apple*"結果如下:

docid : 0 score : 1.0
docid : 1 score : 1.0
docid : 2 score : 1.0
docid : 3 score : 1.0

也即Lucene放棄了對score的計算。

經過rewrite,得到的新Query物件樹如下:

query    BooleanQuery  (id=89)   
   |  boost    1.0   
   |  clauses    ArrayList<E>  (id=90)   
   |     elementData    Object[3]  (id=97)   
   |------[0]    BooleanClause  (id=99)   
   |          |   occur    BooleanClause$Occur$1  (id=103)   
   |          |       name    "MUST"   
   |          |       ordinal    0   
   |          |---query    BooleanQuery  (id=105)   
   |                  |  boost    1.0   
   |                  |  clauses    ArrayList<E>  (id=115)   
   |                  |    elementData    Object[2]  (id=120)   

   |                  |       //"apple*"被用方式一重寫為ConstantScoreQuery
   |                  |---[0]    BooleanClause  (id=121)   
   |                  |      |     occur    BooleanClause$Occur$1  (id=103)   
   |                  |      |         name    "MUST"   
   |                  |      |         ordinal    0   
   |                  |      |---query    ConstantScoreQuery  (id=123)   
   |                  |               boost    1.0   
   |                  |               filter    MultiTermQueryWrapperFilter<Q>  (id=125)   
   |                  |                   query    PrefixQuery  (id=48)   
   |                  |                       boost    1.0   
   |                  |                       numberOfTerms    0   
   |                  |                       prefix    Term  (id=127)   
   |                  |                           field    "contents"   
   |                  |                           text    "apple"   
   |                  |                       rewriteMethod    MultiTermQuery$1  (id=50)    
   |                  |---[1]    BooleanClause  (id=122)   
   |                         |    occur    BooleanClause$Occur$3  (id=111)   
   |                         |        name    "MUST_NOT"   
   |                         |        ordinal    2   
   |                         |---query    TermQuery  (id=124)   
   |                                  boost    1.0   
   |                                  term    Term  (id=130)   
   |                                      field    "contents"   
   |                                      text    "boy"   
   |                     modCount    0   
   |                     size    2   
   |                 disableCoord    false   
   |                 minNrShouldMatch    0   
   |------[1]    BooleanClause  (id=101)   
   |          |   occur    BooleanClause$Occur$2  (id=108)   
   |          |       name    "SHOULD"   
   |          |       ordinal    1   
   |          |---query    BooleanQuery  (id=110)   
   |                  |  boost    1.0   
   |                  |  clauses    ArrayList<E>  (id=117)   
   |                  |    elementData    Object[2]  (id=132)   

   |                  |       //"cat*"被用方式一重寫為ConstantScoreQuery
   |                  |------[0]    BooleanClause  (id=133)   
   |                  |          |   occur    BooleanClause$Occur$2  (id=108)   
   |                  |          |       name    "SHOULD"   
   |                  |          |       ordinal    1   
   |                  |          |---query    ConstantScoreQuery  (id=135)   
   |                  |                   boost    1.0   
   |                  |                   filter    MultiTermQueryWrapperFilter<Q>  (id=137)   
   |                  |                     query    PrefixQuery  (id=63)   
   |                  |                        boost    1.0   
   |                  |                        numberOfTerms    0   
   |                  |                        prefix    Term  (id=138)   
   |                  |                            field    "contents"   
   |                  |                            text    "cat"   
   |                  |                       rewriteMethod    MultiTermQuery$1  (id=50)   
   |                  |------[1]    BooleanClause  (id=134)   
   |                             |   occur    BooleanClause$Occur$2  (id=108)   
   |                             |        name    "SHOULD"   
   |                             |        ordinal    1   
   |                             |---query    TermQuery  (id=136)   
   |                                      boost    1.0   
   |                                      term    Term  (id=140)   
  
|                                          field    "contents"   
  
|                                          text    "dog"   
   |                     modCount    0   
   |                     size    2   
   |                 disableCoord    false   
   |                 minNrShouldMatch    0   
   |------[2]    BooleanClause  (id=102)   
              |    occur    BooleanClause$Occur$3  (id=111)   
              |        name    "MUST_NOT"   
              |        ordinal    2   
              |---query    BooleanQuery  (id=113)   
                      |  boost    1.0   
                      |  clauses    ArrayList<E>  (id=119)   
                      |     elementData    Object[2]  (id=142)   
                      |------[0]    BooleanClause  (id=143)   
                      |          |   occur    BooleanClause$Occur$2  (id=108)   
                      |          |       name    "SHOULD"   
                      |          |       ordinal    1   

                      |          |    //"eat~"作為FuzzyQuery,被重寫成BooleanQuery,
                      |          |     索引中滿足 條件的Term有"eat"和"cat"。FuzzyQuery
                      |          |     不用上述的任何一種RewriteMethod,而是用方式二自己
                      |          |     實現了rewrite函式,是將同"eat"的edit distance最近的
                      |          |     最多maxClauseCount(預設1024)個Term組成BooleanQuery。
                      |          |---query    BooleanQuery  (id=145)   
                      |                   |  boost    1.0   
                      |                   |  clauses    ArrayList<E>  (id=146)   
                      |                   |     elementData    Object[10]  (id=147)   
                      |                   |------[0]    BooleanClause  (id=148)   
                      |                   |          |    occur    BooleanClause$Occur$2  (id=108)   
                      |                   |          |       name    "SHOULD"   
                      |                   |          |       ordinal    1   
                      |                   |          |---query    TermQuery  (id=150)   
                      |                   |                  boost    1.0   
                      |                   |                  term    Term  (id=152)   
                      |                   |                      field    "contents"   
                      |                   |                      text    "eat"   
                      |                   |------[1]    BooleanClause  (id=149)   
                      |                              |    occur    BooleanClause$Occur$2  (id=108)   
                      |                              |       name    "SHOULD"   
                      |                              |       ordinal    1   
                      |                              |---query    TermQuery  (id=151)   
                      |                                       boost    0.33333325   
                      |                                       term    Term  (id=153)   
                      |                                           field    "contents"   
                      |                                           text    "cat"       
                      |                  modCount    2   
                      |                  size    2   
                      |              disableCoord    true   
                      |              minNrShouldMatch    0   
                      |------[1]    BooleanClause  (id=144)   
                                  |   occur    BooleanClause$Occur$2  (id=108)   
                                  |       name    "SHOULD"   
                                  |       ordinal    1   
                                  |---query    TermQuery  (id=154)   
                                          boost    1.0   
                                          term    Term  (id=155)   
                                             field    "contents"   
                                             text    "foods"
  
                        modCount    0   
                        size    2   
                    disableCoord    false   
                    minNrShouldMatch    0   
        modCount    0   
        size    3   
    disableCoord    false   
    minNrShouldMatch    0   

image_thumb6