ElasticSearch學習筆記之二十三 桶聚合
ElasticSearch學習筆記之二十三 桶聚合
桶聚合
桶聚合不同於指標聚合,它不對文件欄位進行計算,而是對我們的文件進行分組, 每一個分組都關聯一個標準 (依賴於聚合型別),這個 標準決定文件是否會劃分到分組. 換句話說,桶就是一個文件的集合,除了桶本身,桶計算還計算並返回劃分到每個桶的文件數量。
與指標聚合不用,桶聚合支援子聚合, 這些子聚合可以聚合由它們的父聚合建立的分組。
桶聚合有很多種, 每一個都有不同的 “bucketing” 策略. 有的策略定義一個分組(單分組聚合),有的策略定義固定數量的多個分組(多分組聚合),還有的策略在聚合執行的過程中動態的分組。
注意:
一次響應返回的分組的最大數被elasticsearch叢集設定的search.max_buckets屬性限制。一般來說它被設定為-1不作限制。但是當結果超過10,000(版本支援的預設最大值)分組的時候會得到一個棄用警告。
Children Aggregation(子聚合)
Children Aggregation 是下面 join
欄位定義的選擇有特定type
欄位的子文件的的單分組聚合。
這類聚合有一個引數:
- type - 應該被選擇的子文件的型別
舉例來說, 我們有一個有questions 和 answers的索引. 有join
欄位的answer型別文件對映:
PUT child_example
{
"mappings": {
"_doc": {
"properties": {
"join": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}
}
問題文件包含一個 tag
欄位,答案文件包含一個owner
欄位。 children aggregation可以把問題文件的 tag
欄位分組對映到答案文件的owner
分組。
問題文件
PUT child_example/_doc/1
{
"join": {
"name": "question"
},
"body": "<p>I have Windows 2003 server and i bought a new Windows 2008 server...",
"title": "Whats the best way to file transfer my site from server to a newer one?",
"tags": [
"windows-server-2003",
"windows-server-2008",
"file-transfer"
]
}
答案如下:
PUT child_example/_doc/2?routing=1
{
"join": {
"name": "answer",
"parent": "1"
},
"owner": {
"location": "Norfolk, United Kingdom",
"display_name": "Sam",
"id": 48
},
"body": "<p>Unfortunately you're pretty much limited to FTP...",
"creation_date": "2009-05-04T13:45:37.030"
}
PUT child_example/_doc/3?routing=1&refresh
{
"join": {
"name": "answer",
"parent": "1"
},
"owner": {
"location": "Norfolk, United Kingdom",
"display_name": "Troll",
"id": 49
},
"body": "<p>Use Linux...",
"creation_date": "2009-05-05T13:45:37.030"
}
下面的請求可以把2者聯合在一起:
POST child_example/_search?size=0
{
"aggs": {
"top-tags": {
"terms": {
"field": "tags.keyword",
"size": 10
},
"aggs": {
"to-answers": {
"children": {
"type" : "answer"
},
"aggs": {
"top-names": {
"terms": {
"field": "owner.display_name.keyword",
"size": 10
}
}
}
}
}
}
}
}
type 指向名為answer 的型別/ 對映 .
上面的案例返回置頂的問題標籤和每個標籤下置頂答案的所有者。
返回如下:
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"top-tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "file-transfer",
"doc_count": 1,
"to-answers": {
"doc_count": 2,
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sam",
"doc_count": 1
},
{
"key": "Troll",
"doc_count": 1
}
]
}
}
},
{
"key": "windows-server-2003",
"doc_count": 1,
"to-answers": {
"doc_count": 2,
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sam",
"doc_count": 1
},
{
"key": "Troll",
"doc_count": 1
}
]
}
}
},
{
"key": "windows-server-2008",
"doc_count": 1,
"to-answers": {
"doc_count": 2,
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sam",
"doc_count": 1
},
{
"key": "Troll",
"doc_count": 1
}
]
}
}
}
]
}
}
}
Range Aggregation(範圍聚合)
Range Aggregation是一個可以使用者自定義一系列範圍,每個範圍代表一個分組的多值分組聚合. 在聚合的過程中,從每個文件提取出值然後檢查每個分組的範圍並且正確的分組。 注意,聚合的每個範圍會包含from
但是排除to
例如:
GET /_search
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"ranges" : [
{ "to" : 100.0 },
{ "from" : 100.0, "to" : 200.0 },
{ "from" : 200.0 }
]
}
}
}
}
結果如下:
{
...
"aggregations": {
"price_ranges" : {
"buckets": [
{
"key": "*-100.0",
"to": 100.0,
"doc_count": 2
},
{
"key": "100.0-200.0",
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
{
"key": "200.0-*",
"from": 200.0,
"doc_count": 3
}
]
}
}
}
Keyed Response
設定 keyed
為 true
會將每個分組和一個獨一無二的key關聯並將返回作為hash返回而不是array:
GET /_search
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"keyed" : true,
"ranges" : [
{ "to" : 100 },
{ "from" : 100, "to" : 200 },
{ "from" : 200 }
]
}
}
}
}
結果如下:
{
...
"aggregations": {
"price_ranges" : {
"buckets": {
"*-100.0": {
"to": 100.0,
"doc_count": 2
},
"100.0-200.0": {
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
"200.0-*": {
"from": 200.0,
"doc_count": 3
}
}
}
}
}
也支援為每個範圍範圍自定義key
:
GET /_search
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"keyed" : true,
"ranges" : [
{ "key" : "cheap", "to" : 100 },
{ "key" : "average", "from" : 100, "to" : 200 },
{ "key" : "expensive", "from" : 200 }
]
}
}
}
}
結果如下:
{
...
"aggregations": {
"price_ranges" : {
"buckets": {
"cheap": {
"to": 100.0,
"doc_count": 2
},
"average": {
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
"expensive": {
"from": 200.0,
"doc_count": 3
}
}
}
}
}