1. 程式人生 > >62.修改分詞器及手動創建分詞器

62.修改分詞器及手動創建分詞器

round 單詞 自己 words 停用 默認 htm ext yellow

主要知識點

  • 修改分詞器
  • 手動創建分詞器

一、修改分詞器

1、默認的分詞器standard,主要有以下四個功能

  • standard tokenizer:以單詞邊界進行切分
  • standard token filter:什麽都不做
  • lowercase token filter:將所有字母轉換為小寫
  • stop token filer(默認被禁用):移除停用詞,比如a the it等等

2、修改分詞器的設置

啟用english的停用詞token filter

PUT /my_index

{

"settings": {

"analysis": {

"analyzer": {

"es_std": {

"type": "standard",

"stopwords": "_english_"

}

}

}

}

}

測試修改後的分詞器

GET /my_index/_analyze

{

"analyzer": "standard",

"text": "a dog is in the house"

}

GET /my_index/_analyze

{

"analyzer": "es_std",

"text":"a dog is in the house"

}

二、定制化自己的分詞器

PUT /my_index

{

"settings": {

"analysis": {

"char_filter": {

"&_to_and": {

"type": "mapping",

"mappings": ["&=> and"]

}

},

"filter": {

"my_stopwords": {

"type": "stop",

"stopwords": ["the", "a"]

}

},

"analyzer": {

"my_analyzer": {

"type": "custom",

"char_filter": ["html_strip", "&_to_and"],

"tokenizer": "standard",

"filter": ["lowercase", "my_stopwords"]

}

}

}

}

}

測試手動創建的分詞器

GET /my_index/_analyze

{

"text": "tom&jerry are a friend in the house, <a>, HAHA!!",

"analyzer": "my_analyzer"

}

PUT /my_index/_mapping/my_type

{

"properties": {

"content": {

"type": "text",

"analyzer": "my_analyzer"

}

}

}

62.修改分詞器及手動創建分詞器