1. 程式人生 > >ES 查詢優化(一)

ES 查詢優化(一)

1、能用term就不用match_phrase

The Lucene nightly benchmarks show that a simple term query is about 10 times as fast as a phrase query, and about 20 times as fast as a proximity query (a phrase query with slop).

term查詢比match_phrase效能要快10倍,比帶slop的match_phrase快20倍。

GET /my_index/my_type/_search
{
    "query": {
        "match_phrase": {
            "title": "quick"
        }
    }
}

變為

GET /my_index/my_type/_search
{
    "query": {
        "term": {
            "title": "quick"
        }
    }
}

2、如果查詢條件與文件排序無關,則一定要用filter,既不用參與分數計算,還能快取資料,加快下次查詢。

比如說要查詢型別為Ford,黃色的,名字包含dev的汽車,一般的查詢語句應該如下:

GET /my_index/my_type/_search
{
    "bool": {
        "must": [
            {
                "term": {
                    "type": "ford"
                }
            },
            {
                "term": {
                    "color": "yellow"
                }
            },
            {
                "term": {
                    "name": "dev"
                }
            }
        ]
    }
}

上述查詢中型別和顏色同樣參與了文件排名得分的計算,但是由於型別和顏色僅作為過濾條件,計算得分至於name的匹配相關。因此上述的查詢是不合理且效率不高的。

GET /my_index/my_type/_search
{
    "bool": {
        "must": {
            "term": {
                "name": "dev"
            }
        },
        "filter": [
        {
            "term": {
                "type": "ford"
            }
        },
        {
            "term": {
                "color": "yellow"
            }
        }]
    }
}

3、如果對查出的資料的順序沒有要求,則可按照_doc排序,取資料時按照插入的順序返回。

_doc has no real use-case besides being the most efficient sort order. So if you don’t care about the order in which documents are returned, then you should sort by _doc. This especially helps when scrolling. _doc to sort by index order.

GET /my_index/my_type/_search
{
    "query": {
        "term": {
            "name": "dev"
        }
    },
    "sort":[
        "_doc"
    ]
}

4、隨機取n條(n>=10000)資料

1)可以利用ES自帶的方法random score查詢。缺點慢,消耗記憶體。

GET /my_index/my_type/_search
{
    "size": 10000,
    "query": {
        "function_score": {
            "query": {
                "term": {
                    "name": "dev"
                }
            },
            "random_score": {
                
            }
        }
    }
}

2)可以利用ES的指令碼查詢。缺點比random score少消耗點記憶體,但比random score慢。

GET /my_index/my_type/_search
{
    "query": {
        "term": {
            "name": "dev"
        }
    },
    "sort": {
        "_script": {
            "type": "number",
            "script": {
                "lang": "painless",
                "inline": "Math.random()"
            },
            "order": "asc"
        }
    }
}

3)插入資料時,多加一個欄位mark,該欄位的值隨機生成。查詢時,對該欄位排序即可。

GET /my_index/my_type/_search
{
    "query": {
        "term": {
            "name": "dev"
        }
    },
    "sort":[
        "mark"
    ]
}

5、range Aggregations時耗時太長

{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "from" : 10, "to" : 50 },
                    { "from" : 50, "to" : 70 },
                    { "from" : 70, "to" : 100 }
                ]
            }
        }
    }
}

如例子所示,我們對[10,50),[50,70),[70,100)三個區間做了聚合操作。因為涉及到比較操作,資料量較大的情況下,可能會比較慢。 解決方案:在插入時,將要聚合的區間以keyword的形式寫入索引中,查詢時,對該欄位做聚合即可。

假設price都小於100,插入的欄位為mark,mark的值為10-50, 50-70, 70-100。
{
    "aggs" : {
        "genres" : {
            "terms" : { "field" : "mark" }
        }
    }
}

6、查詢空字串

如果是要查欄位是否存在或丟失,用Exists Query查詢即可(exists, must_not exits)。

GET /_search
{
    "query": {
        "exists" : { "field" : "user" }
    }
}

GET /_search
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "user"
                }
            }
        }
    }
}

這裡指的是欄位存在,且欄位為“”的field。

curl localhost:9200/customer/_search?pretty -d'{
    "size": 5,
    "query": {
        "bool": {
            "must": {
                "script": {
                    "script": {
                        "inline": "doc['\''strnickname'\''].length()<1",
                        "lang": "painless"
                    }
                }
            }
        }
    }
}'