1. 程式人生 > >java springboot 結合elasticsearch 實現全文檢索 的步驟,有坑請繞行

java springboot 結合elasticsearch 實現全文檢索 的步驟,有坑請繞行

開啟springboot專案

首先我這裡選擇的是jestClient操作elasticsearch

這裡還有一種方式是通過

ElasticsearchRepostiry類似jpa的一種工具介面,但會隨著ela的版本的修改而變化程式碼,所以首選jestClient

ok!第一步先匯入依賴

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-data-elasticsearch</artifactId>

<version>1.5.4.RELEASE</version>

</dependency>

<dependency>

<groupId>io.searchbox</groupId>

<artifactId>jest</artifactId>

</dependency>

<dependency>

<groupId>net.java.dev.jna</groupId>

<artifactId>jna</artifactId>

</dependency>

這裡需要注意 ①

springboot對應的elasticsearch的版本

這裡sprigboot是1.5.4,ela依賴也是1.5.4

springboot 和elasticsearch 版本對應參照請看下面

第二步在application.properties中配置ela 服務地址連線上地址我們才能去呼叫服務

#elasticsearch
[email protected]@
spring.elasticsearch.jest.read-timeout=60000
spring.elasticsearch.jest.connection-timeout=60000

注:@[email protected] 這個是從pom.xml檔案中讀取出來的

然後咱們需要去連線服務,咱們需要獲取jestClient物件去操作查詢

第三步獲取jestClient物件的方式

package com.webi.welive.util;

import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

/**
 * Title: 獲取jestClient物件<br>
 * Description: JestClientUtil<br>
 * Company:韋博英語線上教育部</br>
 * CreateDate:2018年06月14日 14:29
 *
 * @author james.fxy
 */
@Service
public class JestClientUtil {

    private static String spring_elasticsearch_jest_uris;
    private static Integer spring_elasticsearch_jest_read_timeout;
    private static Integer Spring_elasticsearch_jest_connection_timeout;

    @Value("${spring.elasticsearch.jest.uris}")
    public void setSpring_elasticsearch_jest_uris(String spring_elasticsearch_jest_uris) {
        JestClientUtil.spring_elasticsearch_jest_uris = spring_elasticsearch_jest_uris;
    }

    @Value("${spring.elasticsearch.jest.read-timeout}")
    public void setSpring_elasticsearch_jest_read_timeout(Integer spring_elasticsearch_jest_read_timeout) {
        JestClientUtil.spring_elasticsearch_jest_read_timeout = spring_elasticsearch_jest_read_timeout;
    }

    @Value("${spring.elasticsearch.jest.connection-timeout}")
    public void setGetSpring_elasticsearch_jest_connection_timeout(Integer Spring_elasticsearch_jest_connection_timeout) {
        JestClientUtil.Spring_elasticsearch_jest_connection_timeout = Spring_elasticsearch_jest_connection_timeout;
    }

    /**
     * Title: 獲取jestClient<br>
     * Description: <br>
     * CreateDate: 2018/6/14 16:31<br>
     *
     * @param
     * @return
     * @throws Exception
     * @category 獲取jestClient
     * @author james.fxy
     */
    public static JestClient getJestClient() {
        JestClientFactory factory = new JestClientFactory();
        factory.setHttpClientConfig(new HttpClientConfig.Builder(spring_elasticsearch_jest_uris).connTimeout(Spring_elasticsearch_jest_connection_timeout).readTimeout(spring_elasticsearch_jest_read_timeout).multiThreaded(true).build());
        return factory.getObject();
    }
}

第四步我們使用jestClient操作elasticsearch

① 選擇我們需要操作的實體類

package com.webi.welive.lessonhomework.param;


import com.webi.welive.lessonhomework.entity.HomeworkAnswerMedia;
import com.webi.welive.lessonhomework.entity.HomeworkQuestionAnswer;
import com.webi.welive.lessonhomework.entity.HomeworkQuestionMedia;
import lombok.Data;
import org.springframework.data.elasticsearch.annotations.Document;

import java.util.Date;
import java.util.List;


/**
 * Title: LessonHomeworkParam<br>
 * Description: LessonHomeworkParam<br>
 * Company: 韋博英語線上教育部<br>
 * CreateDate:2018年6月9日 上午11:39:42
 *
 * @author james.fxy
 */
@Data
@Document(indexName = "homework", type = "homeworktable")
public class LessonHomeworkParam {

  private Integer id;
    private String question;
  private String explain;
  private Boolean isEnabled;
  private Boolean isDeleted;
    private Integer createUserId;
    private Integer sequence;
  private Integer questionTypes;
  private HomeworkQuestionMediaParam homeworkQuestionsMediaParam;
  private List<HomeworkQuestionAnswerParam> homeworkQuestionAnswerParamList;
    public LessonHomeworkParam() {
        super();
    }

}

注:@Document(indexName = "homework", type = "homeworktable") 

    indexName h和type分別對應著你在往elasticsearch匯入資料時設定的

我在匯入資料時是這樣設定的如下:

input {
    jdbc {
      jdbc_connection_string => "jdbc:sqlserver://10.0.0.130:1433;databaseName=Webi_WeLiveDBTest;"
      jdbc_user => "speakhi_user"
      jdbc_password => "speakhi_user123"
      jdbc_driver_library => "/usr/share/logstash/mssql-jdbc-6.2.1.jre8.jar"
      jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
      jdbc_paging_enabled => "true"
      statement => "SELECT * from Lesson_Homework where CreateTime < GETDATE()"
      schedule => "* * * * *"
      type => "lesson_homework"
    }
   
output {
 stdout {
                codec => rubydebug
    }
    if[type] == "lesson_homework"{
        elasticsearch {
        hosts  => "10.0.0.13:9200"
        index => "homework"
        document_type => "homeworktable"
        document_id => "%{id}"
        }
    }
    
}

可以看到我在匯入elasticsearch匯入的資料配置的indexName和type對應實體類中的document

② 使用jestclient操作查詢語句

字串拼接以後執行,根據官網的文件來的

具體的拼接資料方式可以參考如下

獲取資料的方式解析方式有兩種

我這裡使用了手動解析的方式,因為發現自動解析不好用

注:因為資料在匯入elasticsearch的時候型別會是通過jackJson序列化過去的

我們在解析資料的時候需要保持引數型別是從elasticsearch過來的

/**
   * Title: <br>
   * Description: 使用全文檢索查詢課程資訊(新增過濾條件根據question查詢)<br>
   * CreateDate: 2018/6/14 14:47<br>
   *
   * @param query
   * @return com.mingyisoft.javabase.bean.CommonJsonObject<java.lang.Object>
   * @throws Exception
   * @category @author james.fxy
   */
  public CommonJsonObject<Object> findAllHomeworkByElasticsearch(String query) throws Exception {

    CommonJsonObject<Object> json = new CommonJsonObject<>();
    JestClient jestClient = null;
    try {
      jestClient = JestClientUtil.getJestClient();
      // 判斷一下需要查詢的query欄位第一個欄位和最後一個欄位是否是雙引號,給進行一個去除雙引號的處理
      if (query.startsWith("\"")) {
        query = query.substring(1);
      }
      if (query.endsWith("\"")) {
        query = query.substring(0, query.length() - 1);
      }
    String queryStr = " {\"query\": { \"match\": { \"question\":\"" + query + "\" } }\n" +
        "  ,\n" +
        "  \"post_filter\": {    \n" +
        "        \"term\" : {\n" +
        "            \"isdeleted\" : \"false\"\n" +
        "        }\n" +
        "    }\n" +
        "}";
    json = search(jestClient, indexName, typeName, queryStr);
    jestClient.shutdownClient();
    } catch (Exception e) {
      json.setCode(ErrorCodeEnum.ELASTIC_SEARCH_HAS_ERROR.getCode());
      json.setMsg(ErrorCodeEnum.ELASTIC_SEARCH_HAS_ERROR.getDescription());
      e.printStackTrace();
    }
    return json;
  }

將jestClient和查詢語句一起傳過去,使用jestClient執行查詢

得出結果資料解析資料的過程如下

實際上這就是queryStr字串在 kibana上執行所得到的結果,我們將得到的結果進行一個json的序列化解析反饋給前端

/**
   * Title:全文檢索課後作業 <br>
   * Description: 全文檢索方法<br>
   * CreateDate: 2018/6/14 14:44<br>
   *
   * @param jestClient
   * @param indexName  索引名稱
   * @param typeName   索引型別
   * @param query      查詢語句
   * @return com.mingyisoft.javabase.bean.CommonJsonObject<java.lang.Object>
   * @throws Exception
   * @category
   * @author james.fxy
   */
  public static CommonJsonObject<Object> search(JestClient jestClient, String indexName, String typeName, String query) throws Exception {
    CommonJsonObject<Object> json = new CommonJsonObject<>();

//        List<LessonHomeworkParam> lessonHomeworkParams = new ArrayList<>();
    Search search = new Search.Builder(query)
            .addIndex(indexName)
            .addType(typeName)
            .build();
    JestResult jr = jestClient.execute(search);
//        System.out.println("全文搜尋--" + jr.getJsonString());
    //自動解析
//        System.out.println("全文搜尋--" + jr.getSourceAsObject(User.class));
//        List<SearchResult.Hit<LessonHomeworkParam, Void>> jrList;
//        jrList = ((SearchResult) jr).getHits(LessonHomeworkParam.class);
//        for (SearchResult.Hit<LessonHomeworkParam, Void> lessonHomeworkParamVoidHit : jrList) {
//            LessonHomeworkParam lessonHomeworkParam = lessonHomeworkParamVoidHit.source;
//            lessonHomeworkParams.add(lessonHomeworkParam);
//        }
//        json.setData(lessonHomeworkParams);
//        return json;
//    }
    // 手動解析
    JsonObject jsonObject = jr.getJsonObject();
    JsonObject hitsobject = jsonObject.getAsJsonObject("hits");
    long took = jsonObject.get("took").getAsLong();
    long total = hitsobject.get("total").getAsLong();
    JsonArray jsonArray = hitsobject.getAsJsonArray("hits");

    System.out.println("took:" + took + "  " + "total:" + total);

    List<LessonHomeworkParam> lessonHomeworkParams = new ArrayList<LessonHomeworkParam>();

    for (int i = 0; i < jsonArray.size(); i++) {
      JsonObject jsonHitsObject = jsonArray.get(i).getAsJsonObject();

      // 獲取返回欄位
      JsonObject sourceObject = jsonHitsObject.get("_source").getAsJsonObject();

      // 封裝LessonHomeworkParam物件
      LessonHomeworkParam lessonHomeworkParam = new LessonHomeworkParam();
      lessonHomeworkParam.setId(Integer.parseInt(sourceObject.get("id").getAsNumber().toString()));
      lessonHomeworkParam.setExplain(sourceObject.get("explain").getAsString());
      lessonHomeworkParam.setQuestion(sourceObject.get("question").getAsString());
      // lessonHomeworkParam.setCreateUserId(Integer.parseInt(sourceObject.get("createuserid").getAsNumber().toString()));
      lessonHomeworkParam.setQuestionTypes(Integer.parseInt(sourceObject.get("questiontypes")
          .getAsNumber().toString()));
      lessonHomeworkParam.setIsDeleted(sourceObject.get("isdeleted").getAsBoolean());
      lessonHomeworkParam.setIsEnabled(sourceObject.get("isenabled").getAsBoolean());
      lessonHomeworkParams.add(lessonHomeworkParam);
    }
    json.setData(lessonHomeworkParams);
    return json;
  }

給大家展示一下queryStr在 kibana 上執行的結果

從kibana 上拿到的資料結果也就是我們在java程式碼中解析的資料

此處需要注意一點就是elasticsearch本身的資料使用jackson進行序列化了

kibana解析查詢出來的資料解釋:

kibana上查詢出來的資料

例如:

{

"took": 40,

"timed_out": false,

"_shards": {

"total": 27,

"successful": 27,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 1529,

"max_score": 5.710427,

"hits": [

{

"_index": "catalog",

"_type": "catalogtable",

"_id": "2406",

"_score": 5.710427,

"_source": {

"@timestamp": "2018-06-25T01:23:00.031Z",

"sequence": 8,

"id": 2406,

"isdeleted": false,

"parentid": 2197,

"createuserid": 10,

"name": "2",

"@version": "1",

"type": "lesson_catalog",

"createtime": "2018-06-23T08:20:35.183Z"

}

}

took欄位表示該操作的耗時(單位為毫秒),timed_out欄位表示是否超時,hits欄位表示命中的記錄 total:返回記錄數。max_score:最高的匹配程度

hits:返回的記錄組成的陣列。

我們在解析資料時需要注意格式轉換

如下程式碼:


      // 獲取返回欄位
      JsonObject sourceObject = jsonHitsObject.get("_source").getAsJsonObject();

      // 封裝LessonHomeworkParam物件
      LessonHomeworkParam lessonHomeworkParam = new LessonHomeworkParam();
      lessonHomeworkParam.setId(Integer.parseInt(sourceObject.get("id").getAsNumber().toString()));
      lessonHomeworkParam.setExplain(sourceObject.get("explain").getAsString());
      lessonHomeworkParam.setQuestion(sourceObject.get("question").getAsString());
      // lessonHomeworkParam.setCreateUserId(Integer.parseInt(sourceObject.get("createuserid").getAsNumber().toString()));
      lessonHomeworkParam.setQuestionTypes(Integer.parseInt(sourceObject.get("questiontypes")
          .getAsNumber().toString()));
      lessonHomeworkParam.setIsDeleted(sourceObject.get("isdeleted").getAsBoolean());
      lessonHomeworkParam.setIsEnabled(sourceObject.get("isenabled").getAsBoolean());

中間資料的格式轉換就是我們需要的去手動解析的

注:具體的型別對應如下圖 type中的型別,我們需要進行型別的轉換,然後才能拿出自己想要的資料

至此,java的elasticsearch全文檢索程式碼部分

另有部落格介紹elasticsearch的原理配置