1. 程式人生 > >ElasticSearch教程——kibana巢狀聚合,下鑽分析,聚合分析

ElasticSearch教程——kibana巢狀聚合,下鑽分析,聚合分析

兩個核心概念:bucket和metric

city name
北京 小李
北京 小王
上海 小張
上海 小麗
上海 小陳

基於city劃分buckets
劃分出來兩個bucket,一個是北京bucket,一個是上海bucket

北京bucket:包含了2個人,小李,小王
上海bucket:包含了3個人,小張,小麗,小陳

按照某個欄位進行bucket劃分,那個欄位的值相同的那些資料,就會被劃分到一個bucket中
有一些mysql的sql知識的話,聚合,首先第一步就是分組,對每個組內的資料進行聚合分析,分組,就是我們的bucket

metric:對一個數據分組執行的統計
當我們有了一堆bucket之後,就可以對每個bucket中的資料進行聚合分詞了,比如說計算一個bucket內所有資料的數量,或者計算一個bucket內所有資料的平均值,最大值,最小值

bucket:group by user_id --> 那些user_id相同的資料,就會被劃分到一個bucket中
metric,就是對一個bucket執行的某種聚合分析的操作,比如說求平均值,求最大值,求最小值

計算一個數量計算每個tag下的商品數量
 

GET /ecommerce/product/_search
{
  "size" : 0,  
  "aggs": {
    "group_by_tags": {
      "terms": { "field": "tags" }
    }
  }
}

size:只獲取聚合結果,而不要執行聚合的原始資料
aggs:固定語法,要對一份資料執行分組聚合操作
gourp_by_tags:就是對每個aggs,都要起一個名字,這個名字是隨機的,你隨便取什麼都ok
terms:根據欄位的值進行分組
field:根據指定的欄位的值進行分組將文字

field的fielddata屬性設定為true (正排索引 用於巢狀聚合查詢, 詳細檢視fielddata原理初探

PUT /ecommerce/_mapping/product
 
{
  "properties": {
    "tags": {
      "type": "text",
      "fielddata": true
    }
  }
}
GET /ecommerce/product/_search
{
  "size": 0,
  "aggs": {
    "all_tags": {
      "terms": { "field": "tags" }
    }
  }
}
 
{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "fangzhu",
          "doc_count": 2
        },
        {
          "key": "meibai",
          "doc_count": 2
        },
        {
          "key": "qingxin",
          "doc_count": 1
        }
      ]
    }
  }
}

hits.hits:我們指定了size是0,所以hits.hits就是空的,否則會把執行聚合的那些原始資料給你返回回來
aggregations:聚合結果
gourp_by_tags:我們指定的某個聚合的名稱
buckets:根據我們指定的field劃分出的buckets
key:每個bucket對應的那個值
doc_count:這個bucket分組內,有多少個數據
每種tag對應的bucket中的資料的
預設的排序規則:按照doc_count降序排序

按搜尋結果聚合

對名稱中包含yagao的商品,計算每個tag下的商品數量
 

GET /ecommerce/product/_search
{
  "size": 0,
  "query": {
    "match": {
      "name": "yagao"
    }
  },
  "aggs": {
    "all_tags": {
      "terms": {
        "field": "tags"
      }
    }
  }
}
{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all_tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "fangzhu",
          "doc_count": 2
        },
        {
          "key": "meibai",
          "doc_count": 1
        },
        {
          "key": "qingxin",
          "doc_count": 1
        }
      ]
    }
  }
}

top_hits 獲取前幾個doc_

source 返回指定field

GET /ecommerce/product/_search
{
    "size": 0,
    "aggs" : {
        "group_by_tags" : {
            "terms" : { "field" : "tags" },
            "aggs" : {
                "top_tags": {
                  "top_hits": { 
                    "_source": {
                      "include": "name"
                    }, 
                    "size": 1
                  }
                } 
            }
        }
    }
}

計算每個tag下的商品的平均價格/最小价格/最大價格/總價

count:bucket,terms,自動就會有一個doc_count,就相當於是count
avg:avg aggs,求平均值
max:求一個bucket內,指定field值最大的那個資料
min:求一個bucket內,指定field值最小的那個資料
sum:求一個bucket內,指定field值的總和先分組,再算每組的平均值
 

GET /ecommerce/product/_search
{
    "size": 0,
    "aggs" : {
        "group_by_tags" : {
            "terms" : { "field" : "tags" },
            "aggs" : {
                "avg_price": { "avg": { "field": "price" } },
                "min_price" : { "min": { "field": "price"} }, 
                "max_price" : { "max": { "field": "price"} },
                "sum_price" : { "sum": { "field": "price" } } 
            }
        }
    }

avg_price:我們自己取的metric aggs的名字
value:我們的metric計算的結果,每個bucket中的資料的price欄位求平均值後的結果

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "fangzhu",
          "doc_count": 2,
          "max_price": {
            "value": 30
          },
          "min_price": {
            "value": 25
          },
          "avg_price": {
            "value": 27.5
          },
          "sum_price": {
            "value": 55
          }
        },
        {
          "key": "meibai",
          "doc_count": 1,
          "max_price": {
            "value": 30
          },
          "min_price": {
            "value": 30
          },
          "avg_price": {
            "value": 30
          },
          "sum_price": {
            "value": 30
          }
        },
        {
          "key": "qingxin",
          "doc_count": 1,
          "max_price": {
            "value": 40
          },
          "min_price": {
            "value": 40
          },
          "avg_price": {
            "value": 40
          },
          "sum_price": {
            "value": 40
          }
        }
      ]
    }
  }
}

collect_mode

對於子聚合的計算,有兩種方式:

  • depth_first 直接進行子聚合的計算
  • breadth_first 先計算出當前聚合的結果,針對這個結果在對子聚合進行計算。

"order": { "avg_price": "desc" }

計算每個tag下的商品的平均價格,並且按照平均價格降序排序
 

GET /ecommerce/product/_search
{
    "size": 0,
    "aggs" : {
        "all_tags" : {
            "terms" : { "field" : "tags", "collect_mode" : "breadth_first", "order": { "avg_price": "desc" } },
            "aggs" : {
                "avg_price" : {
                    "avg" : { "field" : "price" }
                }
            }
        }
    }
}
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all_tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "qingxin",
          "doc_count": 1,
          "avg_price": {
            "value": 40
          }
        },
        {
          "key": "meibai",
          "doc_count": 1,
          "avg_price": {
            "value": 30
          }
        },
        {
          "key": "fangzhu",
          "doc_count": 2,
          "avg_price": {
            "value": 27.5
          }
        }
      ]
    }
  }
}

"ranges": [{},{}]

按照指定的價格範圍區間進行分組,然後在每組內再按照tag進行分組,最後再計算每組的平均價格

GET /ecommerce/product/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 20
          },
          {
            "from": 20,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_tags": {
          "terms": {
            "field": "tags"
          },
          "aggs": {
            "average_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

histogram

類似於terms,也是進行bucket分組操作,接收一個field,按照這個field的值的各個範圍區間,進行bucket分組操作

interval:10,劃分範圍,0~10,10~20,20~30

GET /ecommerce/product/_search
{
   "size" : 0,
   "aggs":{
      "price":{
         "histogram":{ 
            "field": "price",
            "interval": 10
         },
         "aggs":{
            "revenue": {
               "sum": { 
                 "field" : "price"
               }
             }
         }
      }
   }
}
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price": {
      "buckets": [
        {
          "key": 20,
          "doc_count": 1,
          "revenue": {
            "value": 25
          }
        },
        {
          "key": 30,
          "doc_count": 1,
          "revenue": {
            "value": 30
          }
        },
        {
          "key": 40,
          "doc_count": 1,
          "revenue": {
            "value": 40
          }
        }
      ]
    }
  }
}

date histogram

按照我們指定的某個date型別的日期field,以及日期interval,按照一定的日期間隔,去劃分bucket
date interval = 1m,
2017-01-01~2017-01-31,就是一個bucket
2017-02-01~2017-02-28,就是一個bucket
然後會去掃描每個資料的date field,判斷date落在哪個bucket中,就將其放入那個bucket

min_doc_count:即使某個日期interval,2017-01-01~2017-01-31中,一條資料都沒有,那麼這個區間也是要返回的,不然預設是會過濾掉這個區間的
extended_bounds,min,max:劃分bucket的時候,會限定在這個起始日期,和截止日期內
 

GET /tvs/sales/_search
{
   "size" : 0,
   "aggs": {
      "sales": {
         "date_histogram": {
            "field": "sold_date",
            "interval": "month", 
            "format": "yyyy-MM-dd",
            "min_doc_count" : 0, 
            "extended_bounds" : { 
                "min" : "2016-01-01",
                "max" : "2017-12-31"
            }
         }
      }
   }
}
{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "sales": {
      "buckets": [
        {
          "key_as_string": "2016-01-01",
          "key": 1451606400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-02-01",
          "key": 1454284800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-03-01",
          "key": 1456790400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-04-01",
          "key": 1459468800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2016-05-01",
          "key": 1462060800000,
          "doc_count": 1
        },
        .....
      ]
    }
  }
}

aggregation,scope,一個聚合操作,必須在query的搜尋結果範圍內執行
出來兩個結果,一個結果,是基於query搜尋結果來聚合的; 一個結果,是對所有資料執行聚合的

global

就是global bucket,就是將所有資料納入聚合的scope,而不管之前的query

GET /tvs/sales/_search 
{
  "size": 0, 
  "query": {
    "term": {
      "brand": {
        "value": "長虹"
      }
    }
  },
  "aggs": {
    "single_brand_avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "all": {
      "global": {},
      "aggs": {
        "all_brand_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all": {
      "doc_count": 8,
      "all_brand_avg_price": {
        "value": 2650
      }
    },
    "single_brand_avg_price": {
      "value": 1666.6666666666667
    }
  }
}

single_brand_avg_price:就是針對query搜尋結果,執行的,拿到的,就是長虹品牌的平均價格
all.all_brand_avg_price:拿到所有品牌的平均價格

統計某品牌近三十天的平均價格

GET /tvs/sales/_search 
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "長虹"
      }
    }
  },
  "aggs": {
    "recent_150d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-30d"
          }
        }
      },
      "aggs": {
        "recent_150d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

aggs.filter,針對的是聚合去做的

如果放query裡面的filter,是全域性的,會對所有的資料都有影響

但是,如果,比如說,你要統計,長虹電視,最近1個月的平均值; 最近3個月的平均值; 最近6個月的平均值

bucket filter:對不同的bucket下的aggs,進行filter