Elasticsearch的DSL操作命令大全(一)
文章目錄
以下執行命令都是基於阿里es提供的kibana。
前言:
以前在伺服器上直接使用curl命令就可以進行es的查詢,後來公司用了阿里的es後,在阿里給的伺服器上執行命令居然會報錯
[[email protected] ~]# curl -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}[ [email protected] ~]# timed out waiting for input: auto-logout
解決:原來是人家阿里做了相應的控制,需要輸入使用者和密碼按人家的套路才能訪問,詳情請看https://help.aliyun.com/document_detail/57877.html?spm=a2c4g.11186623.6.548.AAW08d
正確的連線姿勢:
[[email protected] ~]# curl -u hui:hui -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200 { "name" : "huihui", "cluster_name" : "es-cn-huiiiiiiiiiiiii", "cluster_uuid" : "huiiiiiiiiiiiii_iiiii", "version" : { "number" : "5.5.3", "build_hash" : "930huihui", "build_date" : "2017-09-07T15:56:59.599Z", "build_snapshot" : false, "lucene_version" : "6.6.0" }, "tagline" : "You Know, for Search" }
檢視該伺服器所有的索引資訊:
GET _cat/indices?v
獲取索引的mapping:
GET /xiao-2018-6-12/Socials/_mapping
增加:
1.增加指定欄位name的值為xiaoqiang:
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script" : "ctx._source.name = \"xiaoqiang\""
}
刪除:
1.刪除指定欄位:
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news { "script" : "ctx._source.remove(\"name_of_new_field\")" }
2.刪除一條資料:
DELETE mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news
3.根據多個條件批量刪除:
POST mei_toutiao/News/_delete_by_query?routing=news
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "mediaNameZh" : "5time悅讀" } },
{ "term" : { "codeName" : "美髮" } }
]
}
}
}
}
}
更新:
1.區域性更新:
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"doc" : {
"userName": "hao" //有這個欄位則修改,沒有則增加該欄位
}
}
2.更新字串陣列:
POST mei_toutiao/News/AWPN8pLjs4TGXdjfL8_b/_update?routing=news
{
"doc" : {
"littleUrls": [
"http://shishanghui.oss-cn-beijing.aliyuncs.com/700d2d2936f40fabe5a70b1449f07f9df080.jpg?x-oss-process=image/format,jpg/interlace,1",
"http://shishanghui.oss-cn-beijing.aliyuncs.com/ed7ad5d1e23441880c59abf0cfd7a89df080.jpg?x-oss-process=image/format,jpg/interlace,1"
]
}
}
3.全部更新:
(不管有沒有下面這些欄位,都變為只有下面這些內容即全部替換掉下面的,所以慎用!!!)
PUT mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news
{
"counter" : 1,
"tags" : ["red"]
}
4.批量重置評論量大於0的文章的評論量為0:
POST mei_toutiao/News/_update_by_query?routing=news
{
"query": {
"bool": {
"must": [
{
"range": {
"atdCnt": {
"gt": 0
}
}
}
]
}
},
"script": {
"inline":"ctx._source.atdCnt = 0"
}
}
5.批量增加相應欄位並賦值:
POST hui/News/_update_by_query
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"hui": "hehe"
}
}
}
}
}
},
"script": {
"inline":"ctx._source.name = \"xiaoqiang\""
}
}
6.使用指令碼更新:
當文件存在時,把文件的counter欄位設定為3;當文件不存在時,插入一個新的文件,文件的counter欄位的值是2
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script":{
"inline":"ctx._source.counter = 3"
},
"upsert":{"counter":2}
}
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script" : {
"inline": "ctx._source.counter += 4"
}
}
或者:
{
"script" : {
"inline": "ctx._source.counter += params.count",
"lang": "painless",
"params" : {
"count" : 4
}
}
}
搜尋:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"_id": "AWNcz4IrB-kQcwLDJ93q"
}
}
}
}
}
注:
1.“constant_score"的用處參考https://blog.csdn.net/dm_vincent/article/details/42157577
2.match和term的區別可參考https://www.cnblogs.com/yjf512/p/4897294.html
3.term裡面也可以是資料相對應的欄位(如"newType” : 1),根據欄位查可能會返回很多條資料,但是根據_id查只會返回一條資料。
1.搜尋一條資料:
GET mei_toutiao/hui/AWNcz4IrB-kQcwLDJ93q?routing=hui
2.搜尋全部資料:
GET mei_toutiao/_search
注:可以全部搜尋到,但是預設返回10條資料
3.搜尋所有newType欄位為1的資料:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"newType": "1"
}
}
}
}
}
}
}
搜尋所有newType欄位不為1的資料:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must_not" : {
"term" : {
"newType": "1"
}
}
}
}
}
}
}
注意:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"match_phrase" : {
"userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
}
}
}
}
}
}
}
上面可以查到相應的資料,而下面卻不行
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
}
}
}
}
}
}
}
4.存在該欄位的文件:
GET mei_toutiao/_search
{
"query":{
"exists": {
"field": "newType"
}
}
}
不存在該欄位的文件:
GET mei_toutiao/_search
{
"query":{
"bool": {
"must_not": {
"exists": {
"field": "newType"
}
}
}
}
}
5.多欄位查詢:
GET mei_toutiao/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "sourceType" : "FORUM" } },
{ "term" : { "flwCnt" : 0 } }
]
}
}
}
}
}
6.按pubTime欄位降序:升序是asc
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"newType": "1"
}
}
}
}
}
}
, "sort": [
{
"pubTime": "desc"
}
]
}
7.視訊分類中過濾掉抖音:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"codeName": "視訊"
}
},
"must_not" : {
"term" : {
"mediaNameZh": "抖音"
}
}
}
}
}
}
, "sort": [
{
"pubTime": "desc"
}
]
}
對應的java api:
query.must(QueryBuilders.termsQuery("codeName", "視訊"))
.mustNot(QueryBuilders.matchQuery("mediaNameZh", "抖音"));
client.setQuery(fqb).addSort("pubTime", SortOrder.DESC);
分頁加排序:
client.setQuery(fqb).setFrom((message.getInt("pageNo")-1)*10).setSize(10).addSort("pubTime", SortOrder.DESC);
GET mei_toutiao/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"pubDay": {
"gte": "2018-05-11",
"lte": "2018-05-12"
}
}
}
]
}
}
}
昨天到現在:
GET mei_toutiao/_search
{
"query": {
"range" : {
"pubDay" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
}
按相應的時間格式查詢:
GET mei_toutiao/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"pubDay": {
"gte": "2018-05-29 00:00:00",
"lte": "2018-05-30 00:00:00",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
]
}
}
}
或者:
GET mei_toutiao/_search
{
"query": {
"range" : {
"pubDay" : {
"gte": "30/05/2018",
"lte": "2019",
"format": "dd/MM/yyyy||yyyy"
}
}
}
}
對應的java api:
QueryBuilder fqb = QueryBuilders.boolQuery().filter(new RangeQueryBuilder("pubDay").gte("2018-05-29 12:00:00").lte("2018-05-30 00:00:00").format("yyyy-MM-dd HH:mm:ss")).filter(filterQuery(message));
9.script查詢微信url欄位包含__zic的資料:
GET xiaoqiang-2018-11-6/Socials/_search
{
"query": {
"bool" : {
"must" : [
{
"term": {
"sourceType":"weixin"
}
},
{
"script" : {
"script" : "if (doc['url'].value.length() > 31) {doc['url'].value.substring(26,31) == '__biz';}"
}
}
]
}
}
}
聚合統計:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0, //取出前幾條資料
"query" : { //可以先使用query查詢得到需要的資料集
"term" : {
"website" : "微信"
}
},
"aggs" : {
"single_sum": { //這個名字隨便起
"sum" : { "field" : "flwCnt" } //這個必須是number型別欄位,flwCnt欄位為關注量
}
}
}
注意:在執行上面命令的時候遇到了illegal_argument_exception報錯,報錯資訊如下
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [website] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "xiao-2018-4-1",
"node": "Vux5eT5mTg2iiiiiiiiiii",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [website] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
]
},
"status": 400
}
解決:在這個website欄位後面加.keyword就可以了。
原因:原來這個website欄位是text型別,可參考https://www.cnblogs.com/duanxuan/p/6566744.html和https://segmentfault.com/a/1190000008897731
1.分類聚合:
GET mei_toutiao/_search
{
"size" : 0,
"aggs" : {
"per_count" : {
"terms" : {
"size" : 22, //不加這個預設只會返回10條資料
"field" : "codeName"
}
}
}
}
結果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 52766,
"max_score": 0,
"hits": []
},
"aggregations": {
"per_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "視訊",
"doc_count": 17258
},
{
"key": "旅遊",
"doc_count": 10132
},
{
"key": "娛樂",
"doc_count": 8867
},
{
"key": "健康",
"doc_count": 4247
},
{
"key": "情感",
"doc_count": 2932
},
{
"key": "星座",
"doc_count": 2281
},
{
"key": "整形",
"doc_count": 2150
},
{
"key": "美容",
"doc_count": 2012
},
{
"key": "親子",
"doc_count": 861
},
{
"key": "國學",
"doc_count": 444
},
{
"key": "藝術",
"doc_count": 442
},
{
"key": "搭配",
"doc_count": 393
}
]
}
}
}
注:可參考官網https://www.elastic.co/guide/cn/elasticsearch/guide/current/cardinality.html
2.sourceType欄位為論壇的媒體名稱聚合:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"per_count" : {
"terms" : {
"size" : 10000,
"field" : "website.keyword"
}
}
}
}
3.根據name欄位聚合,並且得出每個分類下的最大閱讀量:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"aggs" : {
"per_count" : {
"terms" : {
"size" : 10000,
"field" : "name"
},
"aggs" : {
"max_count" : {
"max" : {
"field" : "view"
}
}
}
}
}
}
4.查詢平媒最近每天的日更量+有多少資料來源(聚合結果去重排序):
GET xiao-2018-4-1/News/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"term" : {
"mediaTname": "平媒"
}
},
{
"range": {
"pubDay": {
"gt": "2018-08-31",
"lt": "2018-09-09"
}
}
}
]
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"field" : "pubDay",
"order" : { "distinct_mediaNameZh" : "desc" }
},
"aggs" : {
"distinct_mediaNameZh" : {
"cardinality" : {
"field" : "mediaNameZh"
}
}
}
}
}
}
結果:
{
"took": 1067,
"timed_out": false,
"_shards": {
"total": 350,
"successful": 350,
"failed": 0
},
"hits": {
"total": 98312,
"max_score": 0,
"hits": []
},
"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1536278400000,
"key_as_string": "2018-09-07",
"doc_count": 20946,
"distinct_mediaNameZh": {
"value": 389
}
},
{
"key": 1535932800000,
"key_as_string": "2018-09-03",
"doc_count": 14651,
"distinct_mediaNameZh": {
"value": 383
}
},
{
"key": 1536019200000,
"key_as_string": "2018-09-04",
"doc_count": 18325,
"distinct_mediaNameZh": {
"value": 381
}
},
{
"key": 1536192000000,
"key_as_string": "2018-09-06",
"doc_count": 20659,
"distinct_mediaNameZh": {
"value": 378
}
},
{
"key": 1536105600000,
"key_as_string": "2018-09-05",
"doc_count": 12752,
"distinct_mediaNameZh": {
"value": 321
}
},
{
"key": 1536364800000,
"key_as_string": "2018-09-08",
"doc_count": 8071,
"distinct_mediaNameZh": {
"value": 246
}
},
{
"key": 1535760000000,
"key_as_string": "2018-09-01",
"doc_count": 1706,
"distinct_mediaNameZh": {
"value": 147
}
},
{
"key": 1535846400000,
"key_as_string": "2018-09-02",
"doc_count": 1202,
"distinct_mediaNameZh": {
"value": 112
}
}
]
}
}
}
注:
1.根據查詢到文件數量排序
"order" : { "_count" : "desc" }
api:
.order(Terms.Order.count(true));
2.根據聚合欄位排序(讓結果按pubDay欄位排序,該欄位類似"2018-08-24")
"order" : { "_term" : "desc" }
api:
AggregationBuilder aggregationBuilder = AggregationBuilders.terms("timeinterval")
.script(new Script("String he=new SimpleDateFormat('HH').format(new Date(doc['timeHour'].value)); if(he.equals('01')){return he;}else{return null;}"))
.size(24).order(Terms.Order.term(false));
注意:(1)fase表示desc,true表示asc (2).script也可換成.field(“timeHour”)
3.根據子聚合結果排序
"order" : { "distinct_mediaNameZh" : "desc" }
api:
.order(Terms.Order.aggregation("distinct_mediaNameZh", true));
5.sourceType欄位為論壇的媒體名稱聚合:
(並且每個媒體名稱取出一個文章的url連結)
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"size" : 10000, //這個語句是沒有問題,但是這麼大的量扛不住(巢狀聚合導致處理的資料量指數型爆炸增加),總是報連線超時
"field" : "website.keyword"
},
"aggs" : {
"per_count" : { //這個欄位名字隨意取
"terms" : {
"size" : 1,
"field" : "url"
}
}
}
}
}
}
解決上面的效能問題(轉換思路):
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"size" : 10000,
"field" : "website.keyword"
},
"aggs": {
"top_age": {
"top_hits": {
"_source": {
"includes": [
"url"
]
},
"size": 1
}
}
}
}
}
}
全域性桶:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"per_count": {
"terms" : { "field" : "website.keyword" }
},
"all": {
"global" : {},
"aggs" : {
"per_count": {
"terms" : { "field" : "website.keyword" }
}
}
}
}
}
合併查詢語句:
{
"bool": {
"must": { "match": { "email": "business opportunity" }},
"should": [
{ "match": { "starred": true }},
{ "bool": {
"must": { "match": { "folder": "inbox" }},
"must_not": { "match": { "spam": true }}
}}
],
"minimum_should_match": 1
}
}
注:上面這個語句邏輯比較複雜需要好好思考一下(找出信件正文包含business opportunity的星標郵件,或者在收件箱正文包含business opportunity的非垃圾郵件),該列子來自官網https://www.elastic.co/guide/cn/elasticsearch/guide/current/query-dsl-intro.html
返回指定的欄位:
1.store:返回有newType欄位資料的codeName和view的內容
GET mei_toutiao/_search
{
"stored_fields" : ["codeName", "view"],
"query":{
"exists": {
"field": "newType"
}
}
}
SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
.setRouting(routing).storedFields(new String[] {"titleZh", "uuid"});
參考:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html
前提:mapping中相應的欄位store引數為true
(參考https://blog.csdn.net/napoay/article/details/73100110?locationNum=9&fps=1#323-store)預設情況下,自動是被索引的也可以搜尋,但是不儲存,這也沒關係,因為_source欄位裡面儲存了一份原始文件。在某些情況下,store引數有意義,比如一個文件裡面有title、date和超大的content欄位,如果只想獲取title和date,可以這樣:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "text",
"store": true
},
"date": {
"type": "date",
"store": true
},
"content": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"title": "Some short title",
"date": "2015-01-01",
"content": "A very long content field..."
}
GET my_index/_search
{
"stored_fields": [ "title", "date" ]
}
查詢結果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 1,
"fields": {
"date": [
"2015-01-01T00:00:00.000Z"
],
"title": [
"Some short title"
]
}
}
]
}
}
Stored fields返回的總是陣列,如果想返回原始欄位,還是要從_source中取。
注意:在java程式碼中需要將field放在陣列中,否則只會返回陣列中的第一個
JSONObject hitJson = JSONObject.fromObject(hit.getFields());
String[] fields = [ "keywordsZh", "littleUrls"];
for (Object field : fields) {
if (hit.getFields().containsKey(field)) {
if (field.equals("keywordsZh")) {
@SuppressWarnings("unchecked")
List<String> keywordsZh = (List<String>) hitJson.getJSONObject(field.toString()).get("values");
json.put(field, keywordsZh);
// json.put(field, hitJson.getJSONObject(field.toString()).get("value")); // 只返回該陣列的第一個值
}
}
}
2.返回一個指定的欄位:
GET mei_toutiao/_search
{
"_source": "newType",
"query":{
"term": {
"uuid": "b6a0d42731c94db1a75383c192b5544a"
}
}
}
或者:
GET mei_toutiao/_search
{
"_source": {
"includes": "newType"
},
"query":{
"term": {
"uuid": "b6a0d42731c94db1a75383c192b5544a"
}
}
}
3.只返回newType和keywordsZh欄位:
GET mei_toutiao/_search
{
"_source": [ "newType", "keywordsZh" ]
}
或者:
GET mei_toutiao/_search
{
"_source": {
"includes": [ "newType", "keywordsZh" ]
}
}
4.返回欄位字首名為t的:
GET mei_toutiao/_search
{
"_source": "t*"
}
5.返回除newType和keywordsZh欄位的:
GET mei_toutiao/_search
{
"_source": {
"excludes": [ "newType", "keywordsZh" ]
}
}
SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
.setRouting(routing).setFetchSource(new String[] {"titleZh", "uuid"} , null);
注:如果同時存在includes和excludes則取他兩的交集