ES 查詢優化(一)
1、能用term就不用match_phrase
The Lucene nightly benchmarks show that a simple term query is about 10 times as fast as a phrase query, and about 20 times as fast as a proximity query (a phrase query with slop).
term查詢比match_phrase效能要快10倍,比帶slop的match_phrase快20倍。
GET /my_index/my_type/_search { "query": { "match_phrase": { "title": "quick" } } } 變為 GET /my_index/my_type/_search { "query": { "term": { "title": "quick" } } }
2、如果查詢條件與文件排序無關,則一定要用filter,既不用參與分數計算,還能快取資料,加快下次查詢。
比如說要查詢型別為Ford,黃色的,名字包含dev的汽車,一般的查詢語句應該如下:
GET /my_index/my_type/_search { "bool": { "must": [ { "term": { "type": "ford" } }, { "term": { "color": "yellow" } }, { "term": { "name": "dev" } } ] } }
上述查詢中型別和顏色同樣參與了文件排名得分的計算,但是由於型別和顏色僅作為過濾條件,計算得分至於name的匹配相關。因此上述的查詢是不合理且效率不高的。
GET /my_index/my_type/_search { "bool": { "must": { "term": { "name": "dev" } }, "filter": [ { "term": { "type": "ford" } }, { "term": { "color": "yellow" } }] } }
3、如果對查出的資料的順序沒有要求,則可按照_doc排序,取資料時按照插入的順序返回。
_doc has no real use-case besides being the most efficient sort order. So if you don’t care about the order in which documents are returned, then you should sort by _doc. This especially helps when scrolling. _doc to sort by index order.
GET /my_index/my_type/_search { "query": { "term": { "name": "dev" } }, "sort":[ "_doc" ] }
4、隨機取n條(n>=10000)資料
1)可以利用ES自帶的方法random score查詢。缺點慢,消耗記憶體。
GET /my_index/my_type/_search { "size": 10000, "query": { "function_score": { "query": { "term": { "name": "dev" } }, "random_score": { } } } }
2)可以利用ES的指令碼查詢。缺點比random score少消耗點記憶體,但比random score慢。
GET /my_index/my_type/_search { "query": { "term": { "name": "dev" } }, "sort": { "_script": { "type": "number", "script": { "lang": "painless", "inline": "Math.random()" }, "order": "asc" } } }
3)插入資料時,多加一個欄位mark,該欄位的值隨機生成。查詢時,對該欄位排序即可。
GET /my_index/my_type/_search { "query": { "term": { "name": "dev" } }, "sort":[ "mark" ] }
5、range Aggregations時耗時太長
{ "aggs" : { "price_ranges" : { "range" : { "field" : "price", "ranges" : [ { "from" : 10, "to" : 50 }, { "from" : 50, "to" : 70 }, { "from" : 70, "to" : 100 } ] } } } }
如例子所示,我們對[10,50),[50,70),[70,100)三個區間做了聚合操作。因為涉及到比較操作,資料量較大的情況下,可能會比較慢。 解決方案:在插入時,將要聚合的區間以keyword的形式寫入索引中,查詢時,對該欄位做聚合即可。
假設price都小於100,插入的欄位為mark,mark的值為10-50, 50-70, 70-100。 { "aggs" : { "genres" : { "terms" : { "field" : "mark" } } } }
6、查詢空字串
如果是要查欄位是否存在或丟失,用Exists Query查詢即可(exists, must_not exits)。
GET /_search { "query": { "exists" : { "field" : "user" } } } GET /_search { "query": { "bool": { "must_not": { "exists": { "field": "user" } } } } }
這裡指的是欄位存在,且欄位為“”的field。
curl localhost:9200/customer/_search?pretty -d'{ "size": 5, "query": { "bool": { "must": { "script": { "script": { "inline": "doc['\''strnickname'\''].length()<1", "lang": "painless" } } } } } }'