1. 程式人生 > >【ElasticSearch】(五)“Result window is too large & 深度分頁”的利弊權衡

【ElasticSearch】(五)“Result window is too large & 深度分頁”的利弊權衡

    如題,在使用elastic search的dsl查詢過程中,遇到了如下問題:

{
	"error": {
		"root_cause": [{
			"type": "query_phase_execution_exception",
			"reason": "Result window is too large, from + size must be less than or equal to: [200] but was [1000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
		}],
		"type": "search_phase_execution_exception",
		"reason": "all shards failed",
		"phase": "query",
		"grouped": true,
		"failed_shards": [{
			"shard": 0,
			"index": "fcar_city",
			"node": "7EtAlFI7QEOpQD3rHvTm0g",
			"reason": {
				"type": "query_phase_execution_exception",
				"reason": "Result window is too large, from + size must be less than or equal to: [200] but was [1000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
			}
		}]
	},
	"status": 500
}

     比較不解,我的dsl語句是這樣:

{
  
"query": {
    
"bool": {
      
"must": [
        
{
          
"match_all": {
            
          }

        }

      ]

    }

  },
  
"from": 0,
  
"size": 1000

}

     僅僅是對“fcar_city”這一個索引,做了“match_all”查詢,結果:result windows is too large.很不解。網上搜索,大致的解決方案,是通過修改“max_result_window”,比預設的size值大即可,比如:

PUT fcar_city/_settings
{
  "index":{
    "max_result_window":1000000
  }
}

     我對fcar_city索引重設max_result_window屬性,之後dsl查詢成功。

      

     過程中在stackoverflow上看到一個帖子,直接修改上述屬性會導致一些問題,比如 high memory consumption,這裡牽扯到一個概念“deep paging”(深度分頁),es官方對其介紹:

     

https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

     https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html

    介紹分頁:

    1.es要實現mysql中limit的效果,通過from size來做。

  size :指示應返回的結果數,預設為 10

  from  :指示應跳過的初始結果數,預設為 0

  舉例,每頁現實5條記錄,分3頁,分別獲取第1~3頁的內容:

GET / _search ?size = 5 
GET / _search ?size = 5 &from = 5 
GET / _search ?size = 5 &from = 10

       之所以說調大max_result_window會導致high memory consumption,從根上講,搜尋請求通常跨越多個分片,每個分片都會生成自己的排序結果,然後需要對其進行集中排序以確保整體順序正確。

       如果分頁太深或一次請求太多結果(max_result_window調大),假設我們在一個索引中搜索五個主分片,當我們請求結果的第一頁(結果1到10)時,每個分片產生它自己的前10個結果並將它們返回到協調節點,然後協調節點對所有50個結果進行排序以選擇整個前10個。現在想象我們要求第1,000頁 - 即結果(10,001到10,010)。一切都以相同的方式工作,每個分片產生其前10,010個結果。然後,協調節點對所有50,050個結果進行排序,並丟棄其中的50,040個結果!可見,在分散式系統中,排序結果的成本隨著頁面越深而呈指數級增長。

 

       所以說,解決“Result window is too large, from + size must be less than or equal to: [200] but was [1000]”這樣的問題,偷懶的話,設定max_result_window滿足業務需求,但是影響了叢集的效能。如果想要避免deep paging導致的high memory consumption問題,請參考下一篇部落格。