1. 程式人生 > >elasticsearch安裝中文分詞器

elasticsearch安裝中文分詞器

1. 分詞器的安裝

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.3/elasticsearch-analysis-ik-6.2.3.zip

NOTE: replace 6.2.3 to your own elasticsearch version

github上面的地址

https://github.com/medcl/elasticsearch-analysis-ik

需要注意安裝的版本和對應的elasticsearch相匹配

使用方法:

1> 在ElasticSearch的配置檔案config/elasticsearch.yml中的最後一行新增引數 index.analysis.analyzer.default.type: ik,則設定所有索引的預設分詞器為ik分詞。

2> 也可以通過設定mapping來使用ik分詞

2. IK分詞器的兩種分詞模式。

1> ik_max_word: 會將文字做最細粒度的拆分,比如會將"北京郵電大學"拆分,會窮盡各種可能的組合;

{
    "tokens":[
        {
            "token":"北京郵電",
            "start_offset":0,
            "end_offset":4,
            "type":"CN_WORD",
            "position":0
        },
        {
            "token":"北京",
            "start_offset":0,
            "end_offset":2,
            "type":"CN_WORD",
            "position":1
        },
        {
            "token":"郵電大學",
            "start_offset":2,
            "end_offset":6,
            "type":"CN_WORD",
            "position":2
        },
        {
            "token":"郵電",
            "start_offset":2,
            "end_offset":4,
            "type":"CN_WORD",
            "position":3
        },
        {
            "token":"電大",
            "start_offset":3,
            "end_offset":5,
            "type":"CN_WORD",
            "position":4
        },
        {
            "token":"大學",
            "start_offset":4,
            "end_offset":6,
            "type":"CN_WORD",
            "position":5
        }
    ]
}

2> ik_smart: 會做最粗粒度的拆分

{
    "tokens":[
        {
            "token":"北京",
            "start_offset":0,
            "end_offset":2,
            "type":"CN_WORD",
            "position":0
        },
        {
            "token":"郵電大學",
            "start_offset":2,
            "end_offset":6,
            "type":"CN_WORD",
            "position":1
        }
    ]
}