ElasticSearch常用操作：查詢與聚合篇

阿新 • • 發佈：2018-10-22

使用目的復雜但是應用條件說明 exist 5.4

[TOC]

0 說明

基於es 5.4和es 5.6，列舉的是個人工作中經常用到的查詢（只是工作中使用的是Java API），如果需要看完整的，可以參考官方相關文檔
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search.html。

1 查詢

先使用一個快速入門來引入，然後後面列出的各種查詢都是用得比較多的（在我的工作環境是這樣），其它沒怎麽用的這裏就不列出了。

1.1 快速入門

1.1.1 查詢全部

GET index/type/_search
{
    "query":{
        "match_all":{}
    }
}

或

GET index/type/_search

1.1.2 分頁（以term為例）

GET index/type/_search
{
    "from":0,
    "size":100,
    "query":{
        "term":{
            "area":"GuangZhou"
        }
    }
}

1.1.3 包含指定字段（以term為例）

GET index/type/_search
{
    "_source":["hobby", "name"],
    "query":{
        "term":{
            "area":"GuangZhou"
        }
    }
}

1.1.4 排序（以term為例）

單個字段排序：

GET index/type/_search
{
    "query":{
        "term":{
            "area":"GuangZhou"
        }
    },
    "sort":[
        {"user_id":{"order":"asc"}},
        {"salary":{"order":"desc"}}
    ]
}

1.2 全文查詢

查詢字段會被索引和分析，在執行之前將每個字段的分詞器（或搜索分詞器）應用於查詢字符串。

1.2.1 match query

{
  "query": {
    "match": {
      "content": {
        "query": "裏皮恒大",
        "operator": "and"
      }
    }
  }
}

operator默認是or，也就是說，“裏皮恒大”被分詞為“裏皮”和“恒大”，只要content中出現兩個之一，都會搜索到；設置為and之後，只有同時出現都會被搜索到。

1.2.2 match_phrase query

文檔同時滿足下面兩個條件才會被搜索到：

（1）分詞後所有詞項都要出現在該字段中
（2）字段中的詞項順序要一致

{
  "query": {
    "match_phrase": {
      "content": "裏皮恒大"
    }
  }
}

1.3 詞項查詢

詞項搜索時對倒排索引中存儲的詞項進行精確匹配，詞項級別的查詢通過用於結構化數據，如數字、日期和枚舉類型。

1.3.1 term query

{
  "query": {
    "term": {
      "postdate": "2015-12-10 00:41:00"
    }
  }
}

1.3.2 terms query

term的升級版，如上面查詢的postdate字段，可以設置多個。

{
  "query": {
    "terms": {
      "postdate": [
        "2015-12-10 00:41:00",
        "2016-02-01 01:39:00"
      ]
    }
  }
}

因為term是精確匹配，所以不要問，[]中的關系怎麽設置and？這怎麽可能，既然是精確匹配，一個字段也不可能有兩個不同的值。

1.3.3 range query

匹配某一範圍內的數據型、日期類型或者字符串型字段的文檔，註意只能查詢一個字段，不能作用在多個字段上。

數值：

{
  "query": {
    "range": {
      "reply": {
        "gte": 245,
        "lte": 250
      }
    }
  }
}

支持的操作符如下：

gt：大於，gte：大於等於，lt：小於，lte：小於等於

日期：

{
  "query": {
    "range": {
      "postdate": {
        "gte": "2016-09-01 00:00:00",
        "lte": "2016-09-30 23:59:59",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

format不加也行，如果寫的時間格式正確。

1.3.4 exists query

返回對應字段中至少有一個非空值的文檔，也就是說，該字段有值（待會會說明這個概念）。

{
  "query": {
    "exists": {
      "field": "user"
    }
  }
}

參考《從Lucene到Elasticsearch：全文檢索實戰》中的說明。

以下文檔會匹配上面的查詢：

文檔	說明
{"user":"jane"}	有user字段，且不為空
{"user":""}	有user字段，值為空字符串
{"user":"-"}	有user字段，值不為空
{"user":["jane"]}	有user字段，值不為空
{"user":["jane",null]}	有user字段，至少一個值不為空即可

下面的文檔不會被匹配：

文檔	說明
{"user":null}	雖然有user字段，但是值為空
{"user":[]}	雖然有user字段，但是值為空
{"user":[null]}	雖然有user字段，但是值為空
{"foo":"bar"}	沒有user字段

1.3.5 ids query

查詢具有指定id的文檔。

{
  "query": {
    "ids": {
      "type": "news",
      "values": "2101"
    }
  }
}

類型是可選的，也可以以數據的方式指定多個id。

{
  "query": {
    "ids": {
      "values": [
        "2101",
        "2301"
      ]
    }
  }
}

1.4 復合查詢

1.4.1 bool query

因為工作中接觸到關於es是做聚合、統計、分類的項目，經常要做各種復雜的多條件查詢，所以實際上，bool query用得非常多，因為查詢條件個數不定，所以處理的邏輯思路時，外層用一個大的bool query來進行承載。（當然，項目中是使用其Java API）

bool query可以組合任意多個簡單查詢，各個簡單查詢之間的邏輯表示如下：

屬性	說明
must	文檔必須匹配must選項下的查詢條件，相當於邏輯運算的AND
should	文檔可以匹配should選項下的查詢條件，也可以不匹配，相當於邏輯運算的OR
must_not	與must相反，匹配該選項下的查詢條件的文檔不會被返回
filter	和must一樣，匹配filter選項下的查詢條件的文檔才會被返回，但是filter不評分，只起到過濾功能

一個例子如下：

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "content": "裏皮"
        }
      },
      "must_not": {
        "match": {
          "content": "中超"
        }
      }
    }
  }
}

需要註意的是，同一個bool下，只能有一個must、must_not、should和filter。

如果希望有多個must時，比如希望同時匹配"裏皮"和"中超"，但是又故意分開這兩個關鍵詞（因為事實上，一個must，然後使用match，並且operator為and就可以達到目的），怎麽操作？註意must下使用數組，然後裏面多個match對象就可以了：

{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "裏皮"
          }
        },
        {
          "match": {
            "content": "恒大"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

當然must下的數組也可以是多個bool查詢條件，以進行更加復雜的查詢。

上面的查詢等價於：

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "content": {
            "query": "裏皮恒大",
            "operator": "and"
          }
        }
      }
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

1.5 嵌套查詢

先添加下面一個索引：

PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "user":{
          "type": "nested",
          "properties": {
            "first":{"type":"keyword"},
            "last":{"type":"keyword"}
          }
        },
        "group":{
          "type": "keyword"
        }
      }
    }
  }
}

添加數據：

PUT my_index/my_type/1
{
  "group":"GuangZhou",
  "user":[
    {
      "first":"John",
      "last":"Smith"
    },
    {
      "first":"Alice",
      "last":"White"
    }
  ]
}

PUT my_index/my_type/2
{
  "group":"QingYuan",
  "user":[
    {
      "first":"Li",
      "last":"Wang"
    },
    {
      "first":"Yonghao",
      "last":"Ye"
    }
  ]
}

查詢：

較簡單的查詢：

{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "term": {
          "user.first": "John"
        }
      }
    }
  }
}

較復雜的查詢：

{
  "query": {
    "bool": {
      "must": [
        {"nested": {
          "path": "user",
          "query": {
            "term": {
              "user.first": {
                "value": "Li"
              }
            }
          }
        }},
        {
          "nested": {
            "path": "user",
            "query": {
              "term": {
                "user.last": {
                  "value": "Wang"
                }
              }
            }
          }
        }
      ]
    }
  }
}

1.6 補充：數組查詢與測試

添加一個索引：

PUT my_index2
{
  "mappings": {
    "my_type2":{
      "properties": {
        "message":{
          "type": "text"
        },
        "keywords":{
          "type": "keyword"
        }
      }
    }
  }
}

添加數據：

PUT /my_index2/my_type/1
{
  "message":"keywords test1",
  "keywords":["美女","動漫","電影"]
}

PUT /my_index2/my_type/2
{
  "message":"keywords test2",
  "keywords":["電影","美妝","廣告"]
}

搜索：

{
  "query": {
    "term": {
      "keywords": "廣告"
    }
  }
}

Note1：註意設置字段類型時，keywords設置為keyword，所以使用term查詢可以精確匹配，但設置為text，則不一定——如果有添加分詞器，則可以搜索到；如果沒有，而是使用默認的分詞器，只是將其分為一個一個的字，就不會被搜索到。這點尤其需要註意到。

Note2：對於數組字段，也是可以做桶聚合的，做桶聚合的時候，其每一個值都會作為一個值去進行分組，而不是整個數組進行分組，可以使用上面的進行測試，不過需要註意的是，其字段類型不能為text，否則聚合會失敗。

Note3：所以根據上面的提示，一般純數組比較適合存放標簽類的數據，就像上面的案例一樣，同時字段類型設置為keyword，而不是text，搜索時進行精確匹配就好了。

2 聚合

2.1 指標聚合

相當於MySQL的聚合函數。

max

{
  "size": 0,
  "aggs": {
    "max_id": {
      "max": {
        "field": "id"
      }
    }
  }
}

size不設置為0，除了返回聚合結果外，還會返回其它所有的數據。

min

{
  "size": 0,
  "aggs": {
    "min_id": {
      "min": {
        "field": "id"
      }
    }
  }
}

avg

{
  "size": 0,
  "aggs": {
    "avg_id": {
      "avg": {
        "field": "id"
      }
    }
  }
}

sum

{
  "size": 0,
  "aggs": {
    "sum_id": {
      "sum": {
        "field": "id"
      }
    }
  }
}

stats

{
  "size": 0,
  "aggs": {
    "stats_id": {
      "stats": {
        "field": "id"
      }
    }
  }
}

2.2 桶聚合

相當於MySQL的group by操作，所以不要嘗試對es中text的字段進行桶聚合，否則會失敗。

Terms

相當於分組查詢，根據字段做聚合。

{
  "size": 0,
  "aggs": {
    "per_count": {
      "terms": {
        "size":100,
        "field": "vtype",
        "min_doc_count":1
      }
    }
  }
}

在桶聚合的過程中還可以進行指標聚合，相當於mysql做group by之後，再做各種max、min、avg、sum、stats之類的：

{
  "size": 0,
  "aggs": {
    "per_count": {
      "terms": {
        "field": "vtype"
      },
      "aggs": {
        "stats_follower": {
          "stats": {
            "field": "realFollowerCount"
          }
        }
      }
    }
  }
}

Filter

相當於是MySQL根據where條件過濾出結果，然後再做各種max、min、avg、sum、stats操作。

{
  "size": 0,
  "aggs": {
    "gender_1_follower": {
      "filter": {
        "term": {
          "gender": 1
        }
      },
      "aggs": {
        "stats_follower": {
          "stats": {
            "field": "realFollowerCount"
          }
        }
      }
    }
  }
}

上面的聚合操作相當於是：查詢gender為1的各個指標。

Filters

在Filter的基礎上，可以查詢多個字段各自獨立的各個指標，即對每個查詢結果分別做指標聚合。

{
  "size": 0,
  "aggs": {
    "gender_1_2_follower": {
      "filters": {
        "filters": [
          {
            "term": {
              "gender": 1
            }
          },
          {
            "term": {
              "gender": 2
            }
          }
        ]
      },
      "aggs": {
        "stats_follower": {
          "stats": {
            "field": "realFollowerCount"
          }
        }
      }
    }
  }
}

Range

{
  "size": 0,
  "aggs": {
    "follower_ranges": {
      "range": {
        "field": "realFollowerCount",
        "ranges": [
          {
            "to": 500
          },
          {
            "from": 500,
            "to": 1000
          },
          {
            "from": 1000,
            "to": 1500
          },
          {
            "from": "1500",
            "to": 2000
          },
          {
            "from": 2000
          }
        ]
      }
    }
  }
}

to：小於，from：大於等於

Date Range

跟上面一個類似的，其實只是字段為日期類型的，然後範圍值也是日期。

ElasticSearch常用操作：查詢與聚合篇

使用目的復雜但是應用條件說明 exist 5.4 [TOC] 0 說明基於es 5.4和es 5.6，列舉的是個人工作中經常用到的查詢（只是工作中使用的是Java API），如果需要看完整的，可以參考官方相關文檔https://www.elastic.co/

ElasticSearch常用操作：查詢與聚合篇

0 說明

1 查詢

1.1 快速入門

1.1.1 查詢全部

1.1.2 分頁（以term為例）

1.1.3 包含指定字段（以term為例）

1.1.4 排序（以term為例）

1.2 全文查詢

1.2.1 match query

1.2.2 match_phrase query

1.3 詞項查詢

1.3.1 term query

1.3.2 terms query

1.3.3 range query

1.3.4 exists query

1.3.5 ids query

1.4 復合查詢

1.4.1 bool query

1.5 嵌套查詢

1.6 補充：數組查詢與測試

2 聚合

2.1 指標聚合

max

min

avg

sum

stats

2.2 桶聚合

Terms

Filter

Filters

Range

Date Range

相關推薦