【ElasticSearch】（五）“Result window is too large & 深度分頁”的利弊權衡

阿新 • • 發佈：2018-12-21

如題，在使用elastic search的dsl查詢過程中，遇到了如下問題：

{
	"error": {
		"root_cause": [{
			"type": "query_phase_execution_exception",
			"reason": "Result window is too large, from + size must be less than or equal to: [200] but was [1000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
		}],
		"type": "search_phase_execution_exception",
		"reason": "all shards failed",
		"phase": "query",
		"grouped": true,
		"failed_shards": [{
			"shard": 0,
			"index": "fcar_city",
			"node": "7EtAlFI7QEOpQD3rHvTm0g",
			"reason": {
				"type": "query_phase_execution_exception",
				"reason": "Result window is too large, from + size must be less than or equal to: [200] but was [1000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
			}
		}]
	},
	"status": 500
}

比較不解，我的dsl語句是這樣：

{
   "query": {
     "bool": {
       "must": [
         {
           "match_all": {
            
          } 
        } 
      ] 
    } 
  },
   "from": 0,
   "size": 1000 
}

僅僅是對“fcar_city”這一個索引，做了“match_all”查詢，結果：result windows is too large.很不解。網上搜索，大致的解決方案，是通過修改“max_result_window”，比預設的size值大即可，比如：

PUT fcar_city/_settings
{
  "index":{
    "max_result_window":1000000
  }
}

我對fcar_city索引重設max_result_window屬性，之後dsl查詢成功。

過程中在stackoverflow上看到一個帖子，直接修改上述屬性會導致一些問題，比如 high memory consumption，這裡牽扯到一個概念“deep paging”（深度分頁），es官方對其介紹：

介紹分頁：

1.es要實現mysql中limit的效果，通過from size來做。

size

：指示應返回的結果數，預設為 10

from ：指示應跳過的初始結果數，預設為 0

舉例，每頁現實5條記錄，分3頁，分別獲取第1～3頁的內容：

GET / _search ？size = 5 
GET / _search ？size = 5 ＆from = 5 
GET / _search ？size = 5 ＆from = 10

之所以說調大max_result_window會導致high memory consumption，從根上講，搜尋請求通常跨越多個分片，每個分片都會生成自己的排序結果，然後需要對其進行集中排序以確保整體順序正確。

如果分頁太深或一次請求太多結果（max_result_window調大），假設我們在一個索引中搜索五個主分片，當我們請求結果的第一頁（結果1到10）時，每個分片產生它自己的前10個結果並將它們返回到協調節點，然後協調節點對所有50個結果進行排序以選擇整個前10個。現在想象我們要求第1,000頁 - 即結果（10,001到10,010）。一切都以相同的方式工作，每個分片產生其前10,010個結果。然後，協調節點對所有50,050個結果進行排序，並丟棄其中的50,040個結果！可見，在分散式系統中，排序結果的成本隨著頁面越深而呈指數級增長。

除此之外，在分散式中執行搜尋，獲取階段的過程如下：

1.協調節點識別需要獲取哪些文件GET並向相關分片發出多請求。
2.如果需要， 每個分片都會載入文件並豐富它們，然後將文件返回到協調節點。
3.獲取所有文件後，協調節點將結果返回給客戶端。

協調節點首先決定實際需要獲取哪些文件。例如，如果我們的查詢指定{ "from": 90, "size": 10 }，前90個結果將被丟棄，只需要檢索接下來的10個結果。這些文件可能來自原始搜尋請求中涉及的一個，部分或全部分片。一旦協調節點收到所有結果，它就會將它們組裝成一個返回給客戶端的響應。

在fetch-phrase過程中，多個分片上會涉及到深度分頁：

query-then-fetch程序支援使用from和size 引數進行分頁，但是在限制範圍內。請記住，每個分片必須構建一個長度優先順序佇列from + size，所有這些佇列都需要傳遞迴協調節點。並且協調節點需要對 number_of_shards * (from + size)文件進行排序以便找到正確的 size文件。根據文件的大小，分片數量以及硬體，分頁10,000到50,000個結果（1,000到5,000頁）深度應該是完全可行的。但是，如果使用足夠大的from值，則使用大量的CPU，記憶體和頻寬，排序過程會變得非常沉重。

所以說，解決“Result window is too large, from + size must be less than or equal to: [200] but was [1000]”這樣的問題，偷懶的話，設定max_result_window滿足業務需求，但是影響了叢集的效能。如果想要避免deep paging導致的high memory consumption問題，請參考下一篇部落格。關於scroll api.

【ElasticSearch】（五）“Result window is too large & 深度分頁”的利弊權衡

【ElasticSearch】（五）“Result window is too large & 深度分頁”的利弊權衡

【ElasticSearch】（五）“Result window is too large & 深度分頁”的利弊權衡

【ElasticSearch】（二）目前我對ES的應用場景

【ElasticSearch】（一）初識ES

【ElasticSearch】（六）淺析Scroll

【ElasticSearch】（七）淺析search_after 及 from&size，scroll，search_after效能分析

【SpringCloud】（五）：服務註冊到Eureka Server

【原創】（五）Linux程序排程-CFS排程器

Elasticsearch 的分頁報錯 result window is too large

elastic query match_all 數據目標超過10000條出錯 Result window is too large

【Elasticsearch 7 探索之路】（五）搜尋相關 Search-API

【讀書筆記（五）】高效程序員的45個習慣

【Unity Shader】（五） ------ 透明效果之半透明效果的實現及原理

【完全分散式Hadoop】（五）jdk1.8環境安裝

【GLSL教程】（五）卡通著色

第4章決策樹演算法【分類】（五決策樹sklearn總結和視覺化總結）

【EJB系列】（五）——EJB與WebService

【Python3.6爬蟲學習記錄】（五）Cookie的使用以及簡單的爬取知乎

【jQuery原始碼淺析】（五）--文件載入--$.ready

[轉]Web APi之認證（Authentication）兩種實現方式【二】（十三）

【ElasticSearch】（五）“Result window is too large & 深度分頁”的利弊權衡

相關推薦