elasticsearch安裝中文分詞器
阿新 • • 發佈:2018-12-04
1. 分詞器的安裝
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.3/elasticsearch-analysis-ik-6.2.3.zip
NOTE: replace 6.2.3 to your own elasticsearch version
github上面的地址
https://github.com/medcl/elasticsearch-analysis-ik
需要注意安裝的版本和對應的elasticsearch相匹配
使用方法:
1> 在ElasticSearch的配置檔案config/elasticsearch.yml中的最後一行新增引數 index.analysis.analyzer.default.type: ik,則設定所有索引的預設分詞器為ik分詞。
2> 也可以通過設定mapping來使用ik分詞
2. IK分詞器的兩種分詞模式。
1> ik_max_word: 會將文字做最細粒度的拆分,比如會將"北京郵電大學"拆分,會窮盡各種可能的組合;
{ "tokens":[ { "token":"北京郵電", "start_offset":0, "end_offset":4, "type":"CN_WORD", "position":0 }, { "token":"北京", "start_offset":0, "end_offset":2, "type":"CN_WORD", "position":1 }, { "token":"郵電大學", "start_offset":2, "end_offset":6, "type":"CN_WORD", "position":2 }, { "token":"郵電", "start_offset":2, "end_offset":4, "type":"CN_WORD", "position":3 }, { "token":"電大", "start_offset":3, "end_offset":5, "type":"CN_WORD", "position":4 }, { "token":"大學", "start_offset":4, "end_offset":6, "type":"CN_WORD", "position":5 } ] }
2> ik_smart: 會做最粗粒度的拆分
{ "tokens":[ { "token":"北京", "start_offset":0, "end_offset":2, "type":"CN_WORD", "position":0 }, { "token":"郵電大學", "start_offset":2, "end_offset":6, "type":"CN_WORD", "position":1 } ] }