1. 程式人生 > >ElasticSearch學習筆記之二十三 桶聚合

ElasticSearch學習筆記之二十三 桶聚合

ElasticSearch學習筆記之二十三 桶聚合

桶聚合

桶聚合不同於指標聚合,它不對文件欄位進行計算,而是對我們的文件進行分組, 每一個分組都關聯一個標準 (依賴於聚合型別),這個 標準決定文件是否會劃分到分組. 換句話說,桶就是一個文件的集合,除了桶本身,桶計算還計算並返回劃分到每個桶的文件數量。

與指標聚合不用,桶聚合支援子聚合, 這些子聚合可以聚合由它們的父聚合建立的分組。

桶聚合有很多種, 每一個都有不同的 “bucketing” 策略. 有的策略定義一個分組(單分組聚合),有的策略定義固定數量的多個分組(多分組聚合),還有的策略在聚合執行的過程中動態的分組。

注意:
一次響應返回的分組的最大數被elasticsearch叢集設定的search.max_buckets屬性限制。一般來說它被設定為-1不作限制。但是當結果超過10,000(版本支援的預設最大值)分組的時候會得到一個棄用警告。

Children Aggregation(子聚合)

Children Aggregation 是下面 join欄位定義的選擇有特定type欄位的子文件的的單分組聚合。

這類聚合有一個引數:

  • type - 應該被選擇的子文件的型別

舉例來說, 我們有一個有questions 和 answers的索引. 有join欄位的answer型別文件對映:

PUT child_example
{
  "mappings": {
    "_doc": {
      "properties": {
        "join": {
          "type": "join",
          "relations": {
            "question": "answer"
          }
        }
} } } }

問題文件包含一個 tag 欄位,答案文件包含一個owner 欄位。 children aggregation可以把問題文件的 tag 欄位分組對映到答案文件的owner 分組。

問題文件

PUT child_example/_doc/1
{
  "join": {
    "name": "question"
  },
  "body": "<p>I have Windows 2003 server and i bought a new Windows 2008 server...",
  "title": "Whats the best way to file transfer my site from server to a newer one?",
  "tags": [
    "windows-server-2003",
    "windows-server-2008",
    "file-transfer"
  ]
}

答案如下:

PUT child_example/_doc/2?routing=1
{
  "join": {
    "name": "answer",
    "parent": "1"
  },
  "owner": {
    "location": "Norfolk, United Kingdom",
    "display_name": "Sam",
    "id": 48
  },
  "body": "<p>Unfortunately you're pretty much limited to FTP...",
  "creation_date": "2009-05-04T13:45:37.030"
}
PUT child_example/_doc/3?routing=1&refresh
{
  "join": {
    "name": "answer",
    "parent": "1"
  },
  "owner": {
    "location": "Norfolk, United Kingdom",
    "display_name": "Troll",
    "id": 49
  },
  "body": "<p>Use Linux...",
  "creation_date": "2009-05-05T13:45:37.030"
}

下面的請求可以把2者聯合在一起:

POST child_example/_search?size=0
{
  "aggs": {
    "top-tags": {
      "terms": {
        "field": "tags.keyword",
        "size": 10
      },
      "aggs": {
        "to-answers": {
          "children": {
            "type" : "answer" 
          },
          "aggs": {
            "top-names": {
              "terms": {
                "field": "owner.display_name.keyword",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

type 指向名為answer 的型別/ 對映 .

上面的案例返回置頂的問題標籤和每個標籤下置頂答案的所有者。

返回如下:

{
  "took": 25,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "top-tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "file-transfer",
          "doc_count": 1, 
          "to-answers": {
            "doc_count": 2, 
            "top-names": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Sam",
                  "doc_count": 1
                },
                {
                  "key": "Troll",
                  "doc_count": 1
                }
              ]
            }
          }
        },
        {
          "key": "windows-server-2003",
          "doc_count": 1, 
          "to-answers": {
            "doc_count": 2, 
            "top-names": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Sam",
                  "doc_count": 1
                },
                {
                  "key": "Troll",
                  "doc_count": 1
                }
              ]
            }
          }
        },
        {
          "key": "windows-server-2008",
          "doc_count": 1, 
          "to-answers": {
            "doc_count": 2, 
            "top-names": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Sam",
                  "doc_count": 1
                },
                {
                  "key": "Troll",
                  "doc_count": 1
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Range Aggregation(範圍聚合)

Range Aggregation是一個可以使用者自定義一系列範圍,每個範圍代表一個分組的多值分組聚合. 在聚合的過程中,從每個文件提取出值然後檢查每個分組的範圍並且正確的分組。 注意,聚合的每個範圍會包含from但是排除to
例如:

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 100.0 },
                    { "from" : 100.0, "to" : 200.0 },
                    { "from" : 200.0 }
                ]
            }
        }
    }
}

結果如下:

{
    ...
    "aggregations": {
        "price_ranges" : {
            "buckets": [
                {
                    "key": "*-100.0",
                    "to": 100.0,
                    "doc_count": 2
                },
                {
                    "key": "100.0-200.0",
                    "from": 100.0,
                    "to": 200.0,
                    "doc_count": 2
                },
                {
                    "key": "200.0-*",
                    "from": 200.0,
                    "doc_count": 3
                }
            ]
        }
    }
}

Keyed Response

設定 keyedtrue 會將每個分組和一個獨一無二的key關聯並將返回作為hash返回而不是array:

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "keyed" : true,
                "ranges" : [
                    { "to" : 100 },
                    { "from" : 100, "to" : 200 },
                    { "from" : 200 }
                ]
            }
        }
    }
}

結果如下:

{
    ...
    "aggregations": {
        "price_ranges" : {
            "buckets": {
                "*-100.0": {
                    "to": 100.0,
                    "doc_count": 2
                },
                "100.0-200.0": {
                    "from": 100.0,
                    "to": 200.0,
                    "doc_count": 2
                },
                "200.0-*": {
                    "from": 200.0,
                    "doc_count": 3
                }
            }
        }
    }
}

也支援為每個範圍範圍自定義key:

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "keyed" : true,
                "ranges" : [
                    { "key" : "cheap", "to" : 100 },
                    { "key" : "average", "from" : 100, "to" : 200 },
                    { "key" : "expensive", "from" : 200 }
                ]
            }
        }
    }
}

結果如下:

{
    ...
    "aggregations": {
        "price_ranges" : {
            "buckets": {
                "cheap": {
                    "to": 100.0,
                    "doc_count": 2
                },
                "average": {
                    "from": 100.0,
                    "to": 200.0,
                    "doc_count": 2
                },
                "expensive": {
                    "from": 200.0,
                    "doc_count": 3
                }
            }
        }
    }
}