Elasticsearch如何實現篩選功能（設定欄位不分詞和聚合操作）

阿新 • • 發佈：2018-11-05

0 起因

中文分詞中比較常用的分詞器是es-ik，建立索引的方式如下：
這裡我們為index personList新建了兩個欄位：name和district，注意索引名稱必須是小寫
（以下格式都是在kibana上做的）

PUT /person_list
{
  "mappings": {
    "info": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "district": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}'

檢視索引詳情和索引某一些屬性

GET person_list
GET /person_list/_settings
GET /person_list/_mapping

新增一些資料用於我們的測試
你可以批量新增（推薦）：

POST /person_list/info/_bulk
{"index":{"_id":"1"}}
{"name":"李明","district":"上海市"}
{"index":{"_id":"2"}}
{"name":"李明","district":"上海市"}
{"index":{"_id":"3"}}
{"name":"李明","district":"北京市"}
{"index":{"_id":"4"}}
{"name":"張偉","district":"上海市"}
{"index":{"_id":"5"}}
{"name":"張偉","district":"北京市"}
{"index":{"_id":"6"}}
{"name":"張偉","district":"北京市"}

也可以逐條新增

POST /person_list/info
 {
   "name": "李明",
   "district":"上海"
 }

下面看看需求

0.1 需求一：實現對name的的搜尋功能

這個很簡單，模糊搜尋和精確搜尋都能實現，同時設定以下offset和size

GET person_list/info/_search
{
  "query": {
    "match_phrase_prefix": {"name": "張偉"}
  },
  "size": 10,
  "from": 0
}

搜尋結果：

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.35667494,
    "hits": [
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "4",
        "_score": 0.35667494,
        "_source": {
          "name": "張偉",
          "district": "北京"
        }
      }
    ]
  }
}

0.2 需求二：實現對name的的聚合，當搜尋某個人名時，顯示同一人名在不同地區的數量

聚合語句如下，我們需要得到張偉在不同地區的人數

 GET person_list/info/_search
{
  "query":{
     "match_phrase_prefix":{"name":"張偉"}
  },
  "aggs":{
    "result":{
      "terms":{"field":"district"}
    }
  }
}

此時返回的結果是

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [district] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "person_list",
        "node": "SOK5mAntQ8SYv6BuOGYuMg",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [district] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ]
  },
  "status": 400
}

報錯了，說要我們設定 fielddata=true，怎麼改呢？

1 配置可被篩選的Index

我們可以通過語句直接修改

 POST /person_list/_mapping/info
{
  "properties": {
        "district": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_max_word",
            "fielddata": true
            }
        }
}

成功了

{
  "acknowledged": true
}

現在再次執行聚合操作

 GET person_list/info/_search
{
  "query":{
     "match_phrase_prefix":{"name":"張偉"}
  },

  "aggs":{
    "result":{
      "terms":{"field":"district"}
    }
  }
}

看一下結果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.47000363,
    "hits": [
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "4",
        "_score": 0.47000363,
        "_source": {
          "name": "張偉",
          "district": "上海市"
        }
      },
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "6",
        "_score": 0.47000363,
        "_source": {
          "name": "張偉",
          "district": "北京市"
        }
      },
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "5",
        "_score": 0.2876821,
        "_source": {
          "name": "張偉",
          "district": "北京市"
        }
      }
    ]
  },
  "aggregations": {
    "result": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "北京",
          "doc_count": 2
        },
        {
          "key": "北京市",
          "doc_count": 2
        },
        {
          "key": "市",
          "doc_count": 2
        },
        {
          "key": "上海",
          "doc_count": 1
        },
        {
          "key": "上海市",
          "doc_count": 1
        },
        {
          "key": "海市",
          "doc_count": 1
        }
      ]
    }
  }
}

出問題了，district欄位被拆了！
大概想一想，由於我們用的ik分詞，所以在聚合的過程中，是先把district分詞然後聚合並統計數量的。
現在思路清晰了，對於district的analyzer設定，我們不應該分詞。
通過搜尋網上的方案，我們再次修改對映

 POST /person_list/_mapping/info
{
  "properties": {
        "district": {
            "type": "text",
            "fielddata": true,
            "fields": {"raw": {"type": "keyword"}}
            }
        }
}

不行，報錯analyzer衝突，怎麼辦？
現在的問題就是我們取消對district的ik分詞，應該就可以了
捋一捋，analyzer有兩種方案：

官方自帶的：standard，simple，whitespace，language，具體左右可以檢視官方文件
自定義/第三方analyzer

考慮了一下，district欄位的所有資料都沒有空格，使用whitespace正好能夠避免被分詞
於是乎我們重新建立了一遍索引：

# 刪除資料
POST person_list/_delete_by_query
{
  "query": { 
    "match_all": {
    }
  }
}

# 刪除索引
DELETE /person_list

# 新建索引
PUT /person_list
{
  "mappings": {
    "info": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "district": {
            "type": "text",
            "analyzer": "whitespace",
            "search_analyzer": "whitespace",
            "fielddata": true
        }
      }
    }
  }
}'


# 匯入資料
POST /person_list/info/_bulk
{"index":{"_id":"1"}}
{"name":"李明","district":"上海市"}
{"index":{"_id":"2"}}
{"name":"李明","district":"上海市"}
{"index":{"_id":"3"}}
{"name":"李明","district":"北京市"}
{"index":{"_id":"4"}}
{"name":"張偉","district":"上海市"}
{"index":{"_id":"5"}}
{"name":"張偉","district":"北京市"}
{"index":{"_id":"6"}}
{"name":"張偉","district":"北京市"}

查詢聚合結果

 GET person_list/info/_search
{
  "query":{
     "match_phrase_prefix":{"name":"張偉"}
  },
  "aggs":{
    "result":{
      "terms":{"field":"district"}
    }
  }
}

結果正是我們想要的：

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.47000363,
    "hits": [
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "4",
        "_score": 0.47000363,
        "_source": {
          "name": "張偉",
          "district": "上海市"
        }
      },
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "6",
        "_score": 0.47000363,
        "_source": {
          "name": "張偉",
          "district": "北京市"
        }
      },
      {
        "_index": "person_list",
        "_type": "info",
        "_id": "5",
        "_score": 0.2876821,
        "_source": {
          "name": "張偉",
          "district": "北京市"
        }
      }
    ]
  },
  "aggregations": {
    "result": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "北京市",
          "doc_count": 2
        },
        {
          "key": "上海市",
          "doc_count": 1
        }
      ]
    }
  }
}

至此，我們已經學會了如何通過ES實現基本的篩選功能了

Elasticsearch如何實現篩選功能（設定欄位不分詞和聚合操作）

0 起因

0.1 需求一：實現對name的的搜尋功能

0.2 需求二：實現對name的的聚合，當搜尋某個人名時，顯示同一人名在不同地區的數量

1 配置可被篩選的Index

Elasticsearch如何實現篩選功能（設定欄位不分詞和聚合操作）

ES 對各欄位建立分詞和mapping建立個人操作記錄

springboot搭建專案之日誌AOP，支援日誌內容可配置控制（黑名單欄位不會列印或其他處理方式）

hibernate中設定欄位不持久化註解

Oracle資料庫裡面查詢字串型別的欄位不為空和為空的SQL語句：

es站內站內搜尋筆記（一） Mysql 如何設定欄位自動獲取當前時間

利用arcpy實現arcgis中欄位自動編號（pycharm匯入arcpy站點包，欄位建立、更新與寫值）

Navicat Premium怎麼設定欄位的唯一性（UNIQUE）？

angular實現商品篩選功能（過濾器）

基於JavaScript實現Json資料根據某個欄位（json中的某個屬性）進行排序

AngularJs實現表格點選不同欄位排序的功能

基於argparser模塊實現 ls 功能（基本實現）

關於Unity實現AR功能（一）

關於Unity實現AR功能（三）AR手機截圖

關於Unity實現AR功能（五）攝像頭轉換與閃光燈開關控制

django filter過濾器實現顯示某個型別指定欄位不同值

查詢重複資料（某個欄位允許指定範圍內偏移）

vue實現複製功能（專案使用）

Qt qtablewidget 實現篩選功能，顯示符合條件的整行資訊

MyBatis學習（四）--解決實體屬性和資料庫欄位不一致的問題

Elasticsearch如何實現篩選功能（設定欄位不分詞和聚合操作）

0 起因

0.1 需求一：實現對name的的搜尋功能

0.2 需求二：實現對name的的聚合，當搜尋某個人名時，顯示同一人名在不同地區的數量

1 配置可被篩選的Index

相關推薦