62.修改分詞器及手動創建分詞器
主要知識點
- 修改分詞器
- 手動創建分詞器
一、修改分詞器
1、默認的分詞器standard,主要有以下四個功能
- standard tokenizer:以單詞邊界進行切分
- standard token filter:什麽都不做
- lowercase token filter:將所有字母轉換為小寫
- stop token filer(默認被禁用):移除停用詞,比如a the it等等
2、修改分詞器的設置
啟用english的停用詞token filter
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}
測試修改後的分詞器
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}
GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}
二、定制化自己的分詞器
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": ["&=> and"]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": ["the", "a"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["html_strip", "&_to_and"],
"tokenizer": "standard",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}
測試手動創建的分詞器
GET /my_index/_analyze
{
"text": "tom&jerry are a friend in the house, <a>, HAHA!!",
"analyzer": "my_analyzer"
}
PUT /my_index/_mapping/my_type
{
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
62.修改分詞器及手動創建分詞器