ElasticSearch最佳入門實踐（六十一）修改分詞器以及定製自己的分詞器

阿新 • • 發佈：2018-11-19

1、預設的分詞器

standard

其餘：
standard tokenizer：以單詞邊界進行切分
standard token filter：什麼都不做
lowercase token filter：將所有字母轉換為小寫
stop token filer（預設被禁用）：移除停用詞，比如a the it等等

2、基於英語的過濾器

修改分詞器的設定
啟用english停用詞token filter

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

在這裡插入圖片描述

3、定製化自己的分詞器

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": ["&=> and"]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["the", "a"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip", "&_to_and"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stopwords"]
        }
      }
    }
  }
}

在這裡插入圖片描述

4、如果要在自己的某個type用到定製的分詞器

PUT /my_index/_mapping/my_type
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "my_analyzer"
    }
  }
}

ElasticSearch最佳入門實踐（六十一）修改分詞器以及定製自己的分詞器

1、預設的分詞器 standard 其餘： standard tokenizer：以單詞邊界進行切分 standard token filter：什麼都不做 lowercase token filter：將所有字母轉換為小寫 stop token filer

ElasticSearch最佳入門實踐（四十一）query string 的分詞以及 mapping 引入案例遺留問題的大揭祕

1、query string分詞 query string必須以和index建立時相同的analyzer進行分詞 query string對exact value和full text的區別對待 date：exact value _all：full text

ElasticSearch最佳入門實踐（三十一）document查詢內部原理揭祕

1、客戶端傳送請求到任意一個node，成為coordinate node 對於讀請求，不一定所有的請求都發送的primary shard 上去，也可以轉發到replied shard 上去，因為replied shard 也是可以服務所有讀請求的 2、coordin

ElasticSearch最佳入門實踐（六十九）優化寫入流程實現durability可靠儲存（translog，flush）

（1）資料寫入buffer緩衝和translog日誌檔案（2）每隔一秒鐘，buffer中的資料被寫入新的segment file，並進入os cache，此時segment被開啟並供search使用（3）buffer被清空（4）重複1~3，新的segment不斷新增，buf

ElasticSearch最佳入門實踐（六十八）優化寫入流程實現NRT近實時（filesystem cache，refresh）

現有流程的問題，每次都必須等待fsync將segment刷入磁碟，才能將segment開啟供search使用，這樣的話，從一個document寫入，到它可以被搜尋，可能會超過1分鐘！！！這就不是近實時的搜尋了！！！主要瓶頸在於fsync實際發生磁碟IO寫資料進磁碟，是很耗時的。

ElasticSearch最佳入門實踐（六十五）基於scoll+bulk+索引別名實現零停機重建索引

1、重建索引一個field的設定是不能被修改的，如果要修改一個Field，那麼應該重新按照新的mapping，建立一個index，然後將資料批量查詢出來，重新用bulk api寫入index中批量查詢的時候，建議採用scroll api，並且採用多執行緒

ElasticSearch最佳入門實踐（六十四）索引管理_定製化自己的dynamic mapping

1、定製dynamic策略 true：遇到陌生欄位，就進行dynamic mapping false：遇到陌生欄位，就忽略 strict：遇到陌生欄位，就報錯定製 PUT /my_index { "mappings": { "my_t

ElasticSearch最佳入門實踐（六十二）type底層資料結構

type，是一個index中用來區分類似的資料的，類似的資料，但是可能有不同的fields，而且有不同的屬性來控制索引建立、分詞器 field的value，在底層的lucene中建立索引的時候，全部是opaque bytes型別，不區分型別的 lucene是沒有

ElasticSearch最佳入門實踐（六十七）document寫入原理（buffer，segment，commit）

（1）資料寫入buffer （2）commit point （3）buffer中的資料寫入新的index segment （4）等待在os cache中的index segment被fsync強制刷到磁碟上（5）新的index sgement被開啟，供search使用（6）b

ElasticSearch最佳入門實踐（五十八）搜尋相關引數梳理以及bouncing results問題解決方案

1、preference 決定了哪些shard會被用來執行搜尋操作 _primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3 bounci

ElasticSearch最佳入門實踐（六十）建立、修改以及刪除索引

1、建立索引建立索引的語法 PUT /my_index { "settings": { ... any settings ... }, "mappings": { "type_one": { ... any mappings ...

ElasticSearch最佳入門實踐（六十六）倒排索引組成結構以及其索引可變原因

倒排索引，是適合用於進行搜尋的倒排索引的結構（1）包含這個關鍵詞的document list （2）包含這個關鍵詞的所有document的數量：IDF（inverse document frequency）（3）這個關鍵詞在每個document中出現的次數：TF（ter

ElasticSearch最佳入門實踐（四十二）什麼是mapping再次回爐透徹理解

（1）往es裡面直接插入資料，es會自動建立索引，同時建立type以及對應的mapping （2）mapping中就自動定義了每個field的資料型別（3）不同的資料型別（比如說text和date），可能有的是exact value，有的是full text （4）exac

ElasticSearch最佳入門實踐（三十九）倒排索引核心原理揭祕

1、例子，兩段文字 doc1：I really liked my small dogs, and I think my mom also liked them doc2：He never liked any dogs, so I hope that my m

ElasticSearch最佳入門實踐（三十八）精確匹配與全文搜尋的對比分析

1、ES中的兩種搜尋模式 1、exact value 2、full text 2、exact value 2017-01-01，exact value，搜尋的時候，必須輸入2017-01-01，才能搜尋出來。如果你輸入一個01，是搜尋不

ElasticSearch最佳入門實踐（三十七）用一個例子告訴你 mapping 到底是什麼

1、插入幾條資料 PUT /website/article/1 { "post_date": "2017-01-01", "title": "my first article", "content": "this is my first article in this w

ElasticSearch最佳入門實踐（三十五）分頁搜尋以及deep paging效能問題深度揭祕

1、如何使用es進行分頁搜尋的語法 size，from GET /_search?size=10 GET /_search?size=10&from=0 GET /_search?size=10&from=20 假設將這6條資料分成3頁，每一頁是2

ElasticSearch最佳入門實踐（三十四）multi-index & multi-type 搜尋模式解析以及搜尋原理解析

1、multi-index 和 multi-type 搜尋模式告訴你如何一次性搜尋多個 index 和多個 type 下的資料 /_search：所有索引，所有type下的所有資料都搜尋出來 /index1/_search：指定一個ind

ElasticSearch最佳入門實踐（三十二）bulk api的奇特json格式與底層效能優化關係揭祕

1、bulk api奇特的json格式 {"action": {"meta"}}\n {"data"}\n {"action": {"meta"}}\n {"data"}\n 2、bulk中的每個操作都可能要轉發到不同的node的shard去執行 3、如果採用比較良好的js

ElasticSearch最佳入門實踐（二十九）document增刪改內部原理揭祕

步驟（1）客戶端選擇一個node傳送請求過去，這個node就是coordinating node（協調節點）（2）coordinating node，對document進行路由，將請求轉發給對應的node（有primary shard）（3）實際的node上的prima

ElasticSearch最佳入門實踐（六十一）修改分詞器以及定製自己的分詞器

1、預設的分詞器

2、基於英語的過濾器

3、定製化自己的分詞器

4、如果要在自己的某個type用到定製的分詞器

相關推薦