1. 程式人生 > >ES:修改分詞器以及定製自己的分詞器

ES:修改分詞器以及定製自己的分詞器

1、預設的分詞器

standard

standard tokenizer:以單詞邊界進行切分 standard token filter:什麼都不做 lowercase token filter:將所有字母轉換為小寫 stop token filer(預設被禁用):移除停用詞,比如a the it等等

2、修改分詞器的設定

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

3、定製化自己的分詞器

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": [
            "&=> and"
          ]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": [
            "the",
            "a"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&_to_and"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stopwords"
          ]
        }
      }
    }
  }
}

在這裡插入圖片描述

PUT /my_index/_mapping/my_type
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "my_analyzer"
    }
  }
}