elasticsearch搜尋中文分詞理解、類似SQL語句中的"LIKE”條件的模糊搜尋以及忽略大小寫搜尋
elasticsearch作為一款搜尋引擎,應用於資料庫無法承受前端的搜尋壓力時,採用其進行資料的搜尋。可以大併發架構設計中一種選擇,以下是elasticsearch搜尋引擎的部分規則,在實際應用中可以讓我們快速熟悉和幫助解決一些問題。
01》不進行分詞的索引建立
URL:es_index_test{
"settings": {
"index": {
"number_of_shards": "4",
"number_of_replicas": "1"
}
},
"mappings": {
"es_index_type_test": {
"properties": {
"productId": {
"type": "text"
},
"productName": {
"type": "keyword",
"index": "true"
}
}
}
}
}
說明:“productName”屬性建立索引時,將其設定為不進行分詞設定。利用wildcard搜尋方式,可以實現MYSQL中的LIKE效果。例如:文件{"productId":10001,"productName":"山雞圖"},可以用{"query":{"wildcard":{"productName":"*雞*"}}},搜尋出來。
02》需要進行分詞的索引建立
URL:es_index_test{
"settings": {
"index": {
"number_of_shards": "4",
"number_of_replicas": "1"
}
},
"mappings": {
"es_index_type_test": {
"properties": {
"productId": {
"type": "text"
},
"productName": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
}
}
}
說明:“productName”屬性建立索引時,將其設定為進行分詞設定。elasticsearch預設針對中文的分詞是按照一箇中文字元,就是一個分詞。例如:文件{"productId":10001,"productName":"山雞圖"}中,會拆分為“山”、“雞”和“圖”三個分詞。中文分詞的拆分,可以安裝ik分詞器進行分詞拆分。例如:文件{"productId":10001,"productName":"山雞圖"}中,會拆分為“山雞”和“圖”兩個分詞。中文短句具體拆分成哪些分詞是ik分詞器的字典來識別的,此字典可以根據實際情況進行調整。
03》忽略大小寫的索引建立
URL:es_index_test{
"settings": {
"index": {
"number_of_shards": "10",
"number_of_replicas": "3"
},
"analysis": {
"normalizer": {
"es_normalizer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"es_index_test": {
"properties": {
"productId": {
"type": "text"
},
"productName": {
"type": "keyword",
"normalizer": "es_normalizer",
"index": "true"
}
}
}
}
}
說明:“productName”屬性建立索引時,將其設定為忽略大小寫。
04》分詞查詢
URL:es_index_test/es_index_type_test/_analyze
- ik分詞器以“ik_max_word”方式拆分
{
"analyzer":"ik_max_word",
"text":"中華人民共和國"}
- 結果
{
"tokens": [
{
"end_offset": 7,
"start_offset": 0,
"position": 0,
"type": "CN_WORD",
"token": "中華人民共和國"
},
{
"end_offset": 4,
"start_offset": 0,
"position": 1,
"type": "CN_WORD",
"token": "中華人民"
},
{
"end_offset": 2,
"start_offset": 0,
"position": 2,
"type": "CN_WORD",
"token": "中華"
},
{
"end_offset": 3,
"start_offset": 1,
"position": 3,
"type": "CN_WORD",
"token": "華人"
},
{
"end_offset": 7,
"start_offset": 2,
"position": 4,
"type": "CN_WORD",
"token": "人民共和國"
},
{
"end_offset": 4,
"start_offset": 2,
"position": 5,
"type": "CN_WORD",
"token": "人民"
},
{
"end_offset": 7,
"start_offset": 4,
"position": 6,
"type": "CN_WORD",
"token": "共和國"
},
{
"end_offset": 6,
"start_offset": 4,
"position": 7,
"type": "CN_WORD",
"token": "共和"
},
{
"end_offset": 7,
"start_offset": 6,
"position": 8,
"type": "CN_CHAR",
"token": "國"
}
]
}
- ik分詞器以“ik_smart”方式拆分
{
"analyzer":"ik_smart",
"text":"中華人民共和國"
}
- 結果
{
"tokens": [
{
"end_offset": 7,
"start_offset": 0,
"position": 0,
"type": "CN_WORD",
"token": "中華人民共和國"
}
]
}
- ES預設
{
"text":"中華人民共和國"}
- 結果
{
"tokens": [
{
"end_offset": 1,
"start_offset": 0,
"position": 0,
"type": "<IDEOGRAPHIC>",
"token": "中"
},
{
"end_offset": 2,
"start_offset": 1,
"position": 1,
"type": "<IDEOGRAPHIC>",
"token": "華"
},
{
"end_offset": 3,
"start_offset": 2,
"position": 2,
"type": "<IDEOGRAPHIC>",
"token": "人"
},
{
"end_offset": 4,
"start_offset": 3,
"position": 3,
"type": "<IDEOGRAPHIC>",
"token": "民"
},
{
"end_offset": 5,
"start_offset": 4,
"position": 4,
"type": "<IDEOGRAPHIC>",
"token": "共"
},
{
"end_offset": 6,
"start_offset": 5,
"position": 5,
"type": "<IDEOGRAPHIC>",
"token": "和"
},
{
"end_offset": 7,
"start_offset": 6,
"position": 6,
"type": "<IDEOGRAPHIC>",
"token": "國"
}
]}
說明:以上三種分詞拆分的方式不一樣,最終產生分詞的結果不相同。
05》資料查詢-wildcard
URL:es_index_test/es_index_type_test/_search{
"query":{"wildcard":{"productName": "山雞圖" }}
}
說明:wildcard種查詢方式需要結合方法支援匹配符合,例如:*雞*,ES會去匹配,在JAVA程式中構建採用。JAVA程式中採用QueryBuilders類的wildcardQuery(String name, Object text)方法。
06》資料查詢-match
URL:es_index_test/es_index_type_test/_search
{
"query":{"match":{"productName": "山雞圖" }}
}
說明:查詢時會根據分詞進行匹配,例如:“山雞圖”ES拆分為“山雞”和“圖”兩個分詞到ES搜尋引擎內篩選出記錄,最後將符合記錄的資料返回。返回的記錄可能包含,山雞湯(包含“山雞”)和山虎圖(包含“圖”分詞)。JAVA程式中採用QueryBuilders類的matchQuery(String name, Object text)方法。
07》資料查詢-term
URL:es_index_test/es_index_type_test/_search{
"query":{
"term":{
"productName":"山雞圖"
}
}
}說明:只有分詞完全匹配“山雞圖”這三個字後,才可以返回資料。JAVA程式中採用QueryBuilders類的termQuery(String name, Object value)方法。
08》資料查詢-terms
URL:es_index_test/es_index_type_test/_search{
"query":{
"terms":{
"productName":["山雞圖","山虎圖"]
}
}
}
說明:分詞匹配“山雞圖”和“山虎圖”返回記錄。JAVA程式中採用QueryBuilders類的termsQuery(String name, String... values)方法。
09》刪除查詢出來的結果集
URL:es_index_test/es_index_type_test/_delete_by_query
{
"query":{"wildcard":{"productName": "*雞*" }}
}
說明:刪除產品名稱包含“雞”字文件。
10》elasticsearch中JAVA例項
1、ElasticSearchPropertiespackage com.jd.ccc.sys.biz.yb.op.notice.config; import org.springframework.boot.context.properties.ConfigurationProperties; import org.springframework.stereotype.Component; import lombok.Data; /** * ElasticSearch搜尋引擎配置引數 * 具體引數的配置資訊在yml檔案內 * * @create 2018-5-10 * @author zhangqiang200<https://blog.csdn.net/zhangqiang_accp> * */ @Data @Component @ConfigurationProperties(prefix = "elasticsearch") public class ElasticSearchProperties { /** * 叢集名 */ private String clusterName; /** * 索引名稱 */ private String indexName; /** * 型別名稱 */ private String typeName; /** * 主節點 */ private String masterNode; /** * 從節點 */ private String slaveNodes; }2、ElasticSearchConfig
package com.jd.ccc.sys.biz.yb.op.notice.config; import java.net.InetAddress; import java.net.UnknownHostException; import org.elasticsearch.client.Client; import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.transport.InetSocketTransportAddress; import org.elasticsearch.transport.client.PreBuiltTransportClient; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.context.properties.EnableConfigurationProperties; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; /** * * 初始化一個ES搜尋引擎配置 * * @create 2018-5-10 * @author zhangqiang200<https://blog.csdn.net/zhangqiang_accp> * */ @Configuration @EnableConfigurationProperties(ElasticSearchProperties.class) public class ElasticSearchConfig { private static final Logger LOGGER = LoggerFactory.getLogger(ElasticSearchConfig.class); @Autowired private ElasticSearchProperties elasticSearchProperties; private static final String SYS_PROPERTY="es.set.netty.runtime.available.processors"; private static final String CLUSTER_NAME="cluster.name"; private static final String CLIENT_SNIFF="client.transport.sniff"; @Bean(name="elasticSearchCluster") public Client getClient() { System.setProperty(SYS_PROPERTY, "false"); Settings settings = Settings.builder().put(CLUSTER_NAME, elasticSearchProperties.getClusterName()) .put(CLIENT_SNIFF, false).build(); TransportClient transportClient = null; try { String[] masters = elasticSearchProperties.getMasterNode().split(":"); transportClient = new PreBuiltTransportClient(settings).addTransportAddress( new InetSocketTransportAddress(InetAddress.getByName(masters[0]), Integer.parseInt(masters[1]))); String[] slaveNodes = elasticSearchProperties.getSlaveNodes().split(",");// 逗號分隔 //遍歷從庫資訊 for (String node : slaveNodes) { String[] ipPort = node.split(":");// 冒號分隔 if (ipPort.length == 2) { transportClient.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipPort[0]), Integer.parseInt(ipPort[1]))); } } return transportClient; } catch (UnknownHostException e) { LOGGER.error("ES 客戶端連線失敗.{}",e); return null; } } }3、服務層操作
/** * 查詢模糊搜尋產品列表的總記錄數 * * @param likeProductName * 模糊搜尋產品名稱的關鍵字 * @param type * 產品型別 * @return 總記錄數 * * @create 2018-5-9 * @author zhangqiang200<https://blog.csdn.net/zhangqiang_accp> */ private Integer queryCount(String likeProductName, String type) { BoolQueryBuilder builder=this.builderQueryData(likeProductName, type); try { SearchResponse searchResponse = elasticSearchCluster.prepareSearch(elasticSearchProperties.getIndexName()) .setTypes(elasticSearchProperties.getTypeName()).setQuery(builder) .setSearchType(SearchType.DEFAULT).get(); SearchHits hits = searchResponse.getHits(); return (int)hits.getTotalHits(); }catch(Exception e) { LOGGER.error("Server access failure,{}",e); return 0; } } /** * 拼接模糊查詢篩選條件 * * @param likeProductName * 模糊搜尋產品名稱的關鍵字 * @param type * 產品型別 * @return 篩選條件字串 * * @create 2018-5-9 * @author zhangqiang200<https://blog.csdn.net/zhangqiang_accp> */ private BoolQueryBuilder builderQueryData(String likeProductName, String type) { BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); boolQueryBuilder.must(QueryBuilders.matchQuery(PRODUCT_STATUS, "03")); if(StringUtils.isNotBlank(likeProductName)) { boolQueryBuilder.must(QueryBuilders.wildcardQuery(PRODUCT_NAME,"*"+likeProductName+"*")); } // 型別不為空 if (StringUtils.isNotBlank(type)) { String[] types = type.split(","); if (types.length == 1) { boolQueryBuilder.must(QueryBuilders.matchQuery(INST_TYPE,type)); } else { boolQueryBuilder.must(QueryBuilders.termsQuery(INST_TYPE, types)); } } LOGGER.debug("wild card query-->{}",boolQueryBuilder.toString()); return boolQueryBuilder; } /** * 模糊查詢商品列表資料 * @param likeProductName 模糊搜尋產品名稱的關鍵字 * @param type 產品型別 * @param startIndex 開始索引 * @param pageSize 每頁大小 * @returnW * * @create 2018-5-9 * @author zhangqiang200<https://blog.csdn.net/zhangqiang_accp> */ private List<String> queryData(String likeProductName, String type, int startIndex, int pageSize) { List<String> resultList = new ArrayList<>(); BoolQueryBuilder builder=this.builderQueryData(likeProductName, type); try { SearchResponse searchResponse = elasticSearchCluster.prepareSearch(elasticSearchProperties.getIndexName()) .setTypes(elasticSearchProperties.getTypeName()).setQuery(builder) .setSearchType(SearchType.DEFAULT).setFrom(startIndex).setSize(pageSize).get(); SearchHit[] hits = searchResponse.getHits().getHits(); for (SearchHit hit : hits) { resultList.add(hit.getSourceAsString()); } }catch(Exception e) { LOGGER.error("Server access failure,{}",e); } return resultList; }