elasticsearch _mget取回多個文件及_bulk批量操作

阿新 • • 發佈：2018-12-09

取回多個文件

Elasticsearch 的速度已經很快了，但甚至能更快。將多個請求合併成一個，避免單獨處理每個請求花費的網路延時和開銷。如果你需要從 Elasticsearch 檢索很多文件，那麼使用 multi-get 或者 mget API 來將這些檢索請求放在一個請求中，將比逐個文件請求更快地檢索到全部文件。

mget API 要求有一個 docs 陣列作為引數，每個元素包含需要檢索文件的元資料，包括 _index 、 _type和 _id 。如果你想檢索一個或者多個特定的欄位，那麼你可以通過 _source 引數來指定這些欄位的名字：

GET /_mget
{
   "docs" : [
      {
         "_index" : "website",
         "_type" :  "blog",
         "_id" :    2
      },
      {
         "_index" : "website",
         "_type" :  "pageviews",
         "_id" :    1,
         "_source": "views"
      }
   ]
}

該響應體也包含一個 docs 陣列，對於每一個在請求中指定的文件，這個陣列中都包含有一個對應的響應，且順序與請求中的順序相同。其中的每一個響應都和使用單個 get request 請求所得到的響應體相同：

{
   "docs" : [
      {
         "_index" :   "website",
         "_id" :      "2",
         "_type" :    "blog",
         "found" :    true,
         "_source" : {
            "text" :  "This is a piece of cake...",
            "title" : "My first external blog entry"
         },
         "_version" : 10
      },
      {
         "_index" :   "website",
         "_id" :      "1",
         "_type" :    "pageviews",
         "found" :    true,
         "_version" : 2,
         "_source" : {
            "views" : 2
         }
      }
   ]
}

ElasticSearch reindex報錯：the final mapping would have more than 1 type

在Elasticsearch 6.0.0或更高版本中建立的索引只包含一個mapping type。在5.x中使用multiple mapping types建立的索引將繼續像以前一樣在Elasticsearch 6.x中執行。 Mapping types將在Elasticsearch 7.0.0中完全刪除。

Indices created in Elasticsearch 6.0.0 or later may only contain a single

mapping type. Indices created in 5.x with multiple mapping types will continue to function as before in Elasticsearch 6.x. Mapping types will be completely removed in Elasticsearch 7.0.0.

https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#_index_per_document_type

如果想檢索的資料都在相同的 _index 中（甚至相同的 _type 中），則可以在 URL 中指定預設的 /_index或者預設的 /_index/_type 。

你仍然可以通過單獨請求覆蓋這些值：

GET /website/blog/_mget
{
   "docs" : [
      { "_id" : 2 },
      { "_type" : "pageviews", "_id" :   1 }
   ]
}

事實上，如果所有文件的 _index 和 _type 都是相同的，你可以只傳一個 ids 陣列，而不是整個 docs 陣列：

GET /website/blog/_mget
{
   "ids" : [ "2", "1" ]
}

注意，我們請求的第二個文件是不存在的。我們指定型別為 blog ，但是文件 ID 1 的型別是 pageviews，這個不存在的情況將在響應體中被報告：

{
  "docs" : [
    {
      "_index" :   "website",
      "_type" :    "blog",
      "_id" :      "2",
      "_version" : 10,
      "found" :    true,
      "_source" : {
        "title":   "My first external blog entry",
        "text":    "This is a piece of cake..."
      }
    },
    {
      "_index" :   "website",
      "_type" :    "blog",
      "_id" :      "1",
      "found" :    false  
    }
  ]
}

未找到該文件。

事實上第二個文件未能找到並不妨礙第一個文件被檢索到。每個文件都是單獨檢索和報告的。

即使有某個文件沒有找到，上述請求的 HTTP 狀態碼仍然是 200 。事實上，即使請求沒有找到任何文件，它的狀態碼依然是 200 --因為 mget 請求本身已經成功執行。為了確定某個文件查詢是成功或者失敗，你需要檢查 found 標記。

_source過濾

預設_source欄位會返回所有的內容，你也可以通過_source進行過濾。比如使用_source,_source_include,_source_exclude. 比如：

POST _bulk
{ "create":  { "_index": "website", "_type": "blog", "_id": "1" }}
{ "text" :  "This is a piece of cake1", "title" : "My first external blog entry1","username.lastname":"lastname1","username.firstname":"firstname1"}
{ "create":  { "_index": "website", "_type": "blog", "_id": "2" }}
{ "text" :  "This is a piece of cake2", "title" : "My first external blog entry2","username.lastname":"lastname2","username.firstname":"firstname1"}
{ "create":  { "_index": "website", "_type": "blog", "_id": "3" }}
{ "text" :  "This is a piece of cake3", "title" : "My first external blog entry3","username.lastname":"lastname3","username.firstname":"firstname1"}

GET /website/blog/_mget 
{
    "docs" : [
        {
            "_id" : "1",
            "_source" : false
        },
        {
            "_id" : "2",
            "_source" : ["title", "text"]
        },
        {
            "_id" : "3",
            "_source" : {
                "include": ["username"],
                "exclude": ["username.lastname"]
            }
        }
    ]
}

{ "docs": [ { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true }, { "_index": "website", "_type": "blog", "_id": "2", "_version": 1, "found": true, "_source": { "text": "This is a piece of cake2", "title": "My first external blog entry2" } }, { "_index": "website", "_type": "blog", "_id": "3", "_version": 1, "found": true, "_source": { "username.firstname": "firstname3" } } ] }

Fields過濾

與其他的普通查詢差不多，mget查詢也支援Fields過濾。

GET /website/blog/_mget 
{
    "docs" : [
        {
            "_id" : "1",
            "fields" : ["title", "text"]
        },
        {
            "_id" : "2",
            "fields" : ["title", "text"]
        }
    ]
}

也可以在URL中的查詢字串中設定預設的過濾，然後在Body中進行特殊的修改：

GET /website/blog/_mget?fields=title,text
{
    "docs" : [
        {
            "_id" : "1" 
        },
        {
            "_id" : "2",
            "fields" : ["user", "user.location"] 
        }
    ]
}

id1的文件就會返回field1和field2，id2的文件就會返回field3和field4.

路由

在mget查詢中也會涉及到路由的問題。可以在url中設定預設的路由，然後在Body中修改：

GET /website/blog/_mget?routing=key1 
{
    "docs" : [
        {
            "_id" : "1",
            "_routing" : "key2"
        },
        {
            "_id" : "2"
        }
    ]
}

在上面的例子中，test/type/1按照key2這個路由鎖定分片進行查詢；test/type/2按照key1這個路由鎖定分片進行查詢。

elasticsearch _mget取回多個文件及_bulk批量操作

取回多個文件

_source過濾

Fields過濾

路由

elasticsearch _mget取回多個文件及_bulk批量操作

c++多個文件中如何共用一個全局變量

在vi中打開多個文件，復制一個文件中多行到另一個文件中

shell之使用paste命令按列拼接多個文件

struts2學習(14)struts2文件上傳和下載（4）多個文件上傳和下載

使用gcc命令編譯多個文件

shell遍歷多個文件夾並進行批量修改文件名

php多個文件上傳

springMvc接受單個文件，多個文件，多組文件

多個文件上傳，一般處理程序

input:file 選擇多個文件用FileReader讀取為二進制

一個mapreduce同時加載讀取多個文件的代碼部分

OFFICE 打開的多個文件之間側換問題

java修改多個文件的名字

一次下載多個文件

上傳多個文件

python 復制多個文件到指定目錄（基於python 3.X）

批量壓縮 css js 文件包含多個文件自動識別

多個文件目錄下Makefile的寫法

（轉）Spring文件上傳,包括一次選中多個文件

elasticsearch _mget取回多個文件及_bulk批量操作

取回多個文件

_source過濾

Fields過濾

路由

相關推薦