1. 程式人生 > >Elasticsearch in java 範例:自動補齊功能(completion suggester)

Elasticsearch in java 範例:自動補齊功能(completion suggester)

ES(elasticsearch)的suggester共有四類(term suggester, phrase suggester, completion suggester, context suggester), 其中completion suggester作為搜尋框中的自動補齊功能,尤為常用。

本文將用java語言實現一個簡單例子來敘述如何使用elasticsearch中的completion suggester功能。例子的主要功能是為股票的名稱和編號建立自動補齊功能。

實現一個完整的completion suggester功能,需要三個步驟:建立schema,索引資料,搜尋資料。下面分別進行介紹。

1.建立schema

schema對於ES來說,就如同一個database的表格定義,它用於預先定義各個欄位的名稱以及型別等。
需要被自動補齊的資料,得用一個型別為completion的欄位來儲存待補齊資料。具體如下:

{
  "stock_suggest" : {
    "mappings" : {
      "stock" : {
        "_id" : {
          "path" : "id"
        },
        "properties" : {
          "id" : {
            "type" : "string"
, "analyzer" : "keyword" }, "name" : { "type" : "string", "index" : "not_analyzed" }, "nameSuggest" : { "type" : "completion", "analyzer" : "standard", "payloads" : true, "preserve_separators"
: false, "preserve_position_increments" : false, "max_input_length" : 50 } } } } } }

需要說明的是payloads屬性被設定為true是為了啟用自動補齊時,返回一個payload欄位用於承載預定義好(在索引資料時定義)的資料資訊。

2. 索引資料

下面將待補齊資料注入到ES 中。程式碼如下:

String esHosts = "";
        String clusterName = "";
        GsonBuilder gsonBuilder = new GsonBuilder().setDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
        Gson gson = gsonBuilder.create();
        Settings settings = ImmutableSettings.settingsBuilder().put("client.transport.sniff", true).put("cluster.name", clusterName).put("node.client", true).build();
        Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(esHosts, 9300));
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        List<Stock> stocks = new ArrayList();
        stocks.add(new Stock("601390", "中國中鐵"));
        stocks.add(new Stock("601186", "中國鐵建"));
        stocks.add(new Stock("601766", "中國中車"));
        stocks.add(new Stock("600115", "東方航空"));
        stocks.add(new Stock("000585", "東北電氣"));
        stocks.add(new Stock("000527", "美的電器"));
        for (Stock stock : stocks) {
            List<String> input = new ArrayList<String>();
            input.add(stock.getName());
            input.add(stock.getId());
            Map<Object, Object> payload = new HashMap<Object, Object>();
            payload.put("id", stock.getId());
            payload.put("name", stock.getName());
            payload.put("type", "stock");
            NameSuggest nameSuggest = new NameSuggest(input, payload, stock.getId());
            stock.setNameSuggest(nameSuggest);
            JsonObject jo = (JsonObject) gson.toJsonTree(stock);
            String jsonSource = gson.toJson(jo);
            bulkRequest.add(client.prepareIndex(index, type, stock.getId()).setSource(jsonSource));
        }
        BulkResponse bulkResponse = bulkRequest.execute().actionGet();

如上所示,我們插入了六條股票資料做為樣例。其中第一行的esHosts和第二行的clusterName需要根據自己的ES叢集配置自行設定。

3.搜尋資料 

資料建完索引後,就可以使用自動補齊功能啦。

String input = "60"
String clusterName = "";
String esHosts = "";
Settings settings = ImmutableSettings.settingsBuilder().put("client.transport.sniff", true).put("cluster.name", clusterName).put("node.client", true).build();
Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(esHosts, 9300));
String field = "nameSuggest";

SuggestRequestBuilder srb = client.prepareSuggest(index);
CompletionSuggestionBuilder csfb = new CompletionSuggestionBuilder(field).field(field).text(input).size(10);
srb = srb.addSuggestion(csfb);
SuggestResponse response = srb.execute().actionGet();
System.out.println(response);

上段程式碼的輸出效果如下所示:

{
  "nameSuggest" : [ {
    "text" : "60",
    "offset" : 0,
    "length" : 2,
    "options" : [ {
      "text" : "600115",
      "score" : 1.0,
      "payload":{"name":"東方航空","id":"600115","type":"stock"}
    }, {
      "text" : "601186",
      "score" : 1.0,
      "payload":{"name":"中國鐵建","id":"601186","type":"stock"}
    }, {
      "text" : "601390",
      "score" : 1.0,
      "payload":{"name":"中國中鐵","id":"601390","type":"stock"}
    }, {
      "text" : "601766",
      "score" : 1.0,
      "payload":{"name":"中國中車","id":"601766","type":"stock"}
    } ]
  } ]
}

可以看出,編號以“60”開頭的股票都被返回了。
好了,一個簡單示例完成了,其中有些細節及相關原理將在後續文章中詳細介紹。