elasticsearch索引快速入門-實時全文搜尋引擎
一.es是什麼
Search & Analyze Data in Real Time
核心的功能就是搜尋,全文搜尋框架,接近實時的搜尋強力搜尋引擎依賴Lucene,新上傳,修改的索引同步速度接近實時
優勢:
1.分散式,水平擴容,高可用
2.實時搜尋,提供分詞功能
3.提供強力的restfulAPI
二.場景介紹
tb級別的資料量,需要提供全文搜尋功能,並且實時返回匹配的結果如下
例如在一個入口搜尋一個組合的關鍵詞,得到最匹配的結果列表,並且是實時返回,索引中存著很多的商品 tb級別) 用火鍋 辣 這樣的組合單詞去搜索索引中的title欄位
1.【通州區】麻合辣重慶九宮格火鍋
2. 【平谷城區】北京嗨辣激情火鍋
分詞器會把titel 【通州區】麻合辣重慶九宮格火鍋 進行一個拆分 [通,州,區,麻,合,辣,重,慶,九,宮,格,火,鍋] ,之後進行單詞匹配,並給匹配的結果打分(關聯性)之後利用打分的結果進行排序,返回最匹配的結果
更詳細有關分詞器內容可以檢視官方文件
3 安裝(單機版)
下載後解壓進入bin目錄
輸入./elasticsearch
看到上圖表示啟動成功
4 es詞彙
es有很多新的名詞例如node document index type id理解這些片語才能有一個好的開始
node 叢集中的一個節點;
index :一個索引是一個包含某些特性類似資料的集合
type:在一個索引裡面,可以定義一個或多個types, 一個type是邏輯 分類你的索引資料
document:一個文字是一個能被索引的基礎單位
對比mysql資料關係如下
mysql: db -table - row
es: index-type-id
mysql的庫等同於es的index,table等同於type,row等同於id;
五. restful API
提取它放到當前命令後目錄輸入
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty'--data-binary "@accounts.json"
這個操作會上傳1000條資料進入bank下面的account type下
批處理命令 _bulk
?pretty漂亮的格式返回
下列是列舉各類的查詢語法
分頁:
curl -XPOST 'localhost:9200/hotelswitch/_search?pretty'-d ' { "query": { "match_all": {} }, "from": 10, "size": 10 }'
排序:
curl -XPOST 'localhost:9200/bank/_search?pretty'-d '
{ "query": { "match_all": {} }, "sort": { "balance": { "order": "desc" } } }'
返回部分欄位 -在source 裡面指定
curl -XPOST 'localhost:9200/hotelswitch/_search?pretty'-d ' { "query": { "match": {"account_number":20} }, "_source": ["account_number", "email"] }'查詢語句 空格代表或查詢
curl -XPOST 'localhost:9200/bank/_search?pretty'-d ' { "query": { "match": { "address": "mill lane" } } }'
組合查詢
curl -XPOST 'localhost:9200/bank/_search?pretty'-d ' { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
範圍過濾器
curl -XPOST 'localhost:9200/bank/_search?pretty'-d ' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }'
聚合函式 類似於sql 的group by
curl -XPOST 'localhost:9200/bank/_search?pretty'-d ' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }'
更多詳細的restful API可以看官方文件
六 java client
1.maven引入依賴jar包
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>2.4.0</version> </dependency>
2.上傳索引和文字
public class elasticSearch_local {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_local.class);
private static Random r=new Random();
static int [] typeConstant =new int[]{0,1,2,3,4,5,6,7,8,9,10};
static String [] roomTypeNameConstant =new String[]{"標準大床房","標準小床房","豪華大房","主題情侶房間"};
public static void main (String []agre) throws Exception {
//http://bj1.lc.data.sankuai.com/ test 80 online 9300
// on startup
//初始化client實列 連線本機的es 9300埠
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
long startTime = System.currentTimeMillis();
for (int i=0;i<1000;i++) {
//上傳資料第一個引數為索引,第二個為type,source是文字
IndexResponse response = client.prepareIndex("hotel", "room")
.setSource(getEsDataString()
)
.get();
}
logger.info(" run 1000 index consume time : "+(System.currentTimeMillis()-startTime));
}
public static XContentBuilder getEsDataString () throws Exception{
SimpleDateFormat sp =new SimpleDateFormat("yyyy-MM-dd");
Date d =new Date();
int offset = r.nextInt(15);
//es的原生api 提供json資料的轉換 jsonBuilder.field(key,value).endObject();
XContentBuilder object= jsonBuilder()
.startObject().field("gmtCreate", (System.currentTimeMillis()-(864000008*offset))+"").field("gmtModified",(System.currentTimeMillis()-(864000008*offset))+"")
.field("sourceType",typeConstant[r.nextInt(10)]+"").field("partnerId",r.nextInt(999999999)+"").field("poiId",r.nextInt(999999999)+"")
.field("roomType",r.nextInt(999999999)+"").field("roomName",roomTypeNameConstant[r.nextInt(4)]).field("bizDay",r.nextInt(999999999)+"")
.field("status",typeConstant[r.nextInt(10)]+"").field("freeCount",r.nextInt(99999)+"").field("soldPrice",r.nextInt(99999)+"")
.field("marketPrice",r.nextInt(99999)+"").field("ratePlanId",r.nextInt(99999)+"").field("accessCode",r.nextInt(999999999)+"")
.field("basePrice",r.nextInt(999999999)+"").field("memPrice",r.nextInt(999999999)+"").field("priceCheck",typeConstant[r.nextInt(10)]+"")
.field("shardPart",typeConstant[r.nextInt(10)]+"").field("sourceCode",typeConstant[r.nextInt(10)]+"").field("realRoomType",r.nextInt(999999999)+"")
.field("typeLimitValue",typeConstant[r.nextInt(10)]+"").field("openInventoryByAccessCodeList","").field("closeInventoryByAccessCodeList","")
.field("openOrClose","1").field("openInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("openInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
.field("closeInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("closeInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
.field("datetime", sp.format(d))
.endObject();
return object;
}
}
3.查詢程式碼
public class elasticSearch_formeituanSearch {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_formeituanSearch.class);
public static void main (String []agre) throws Exception {
//http://bj1.lc.data.sankuai.com/ test 80 online 9300
// on startup
//連線到叢集 初始化客戶端
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
/*QueryBuilder queryBuilder = QueryBuilders
.disMaxQuery()
.add(QueryBuilders.termQuery("roomName", "豪華大床"))
.add(QueryBuilders.termQuery("status", "0"));*/
//查詢條件 在匹配文字的時候一定用matchQuery termQuery 用於精確匹配 匹配數字 ,long型 term查詢不會分詞
QueryBuilder qb = boolQuery().must(matchQuery("roomName", "豪華大房")) ;
/* QueryBuilder qb = boolQuery()
.must(matchQuery("roomName", "豪華大房"))
.must(matchQuery("status", "0"))
.must(matchQuery("sourceCode", "4"))
.must(matchQuery("typeLimitValue", "5"))
.must(matchQuery("soldPrice", "11673"));*/
SearchResponse response = client.prepareSearch("hotel") //hotel索引
.setTypes("room") //room type
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH) //搜尋型別
.setQuery(qb) // Query
.setPostFilter(QueryBuilders.rangeQuery("datetime").gte("2016-10-20").lte("2016-10-21").format("yyyy-MM-dd")) //在查詢到的結果後 進行日期過濾
.setFrom(0).setSize(10).setExplain(true) //分頁
.execute() //執行
.actionGet();
long count =response.getHits().getTotalHits(); //命中的結果
System.out.println(count);
SearchHit[] hits =response.getHits().getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSource());
}
}
}
4 刪除資料
public class elasticSearch_fordelete {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_fordelete.class);
public static void main (String []agre) throws Exception {
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
//匹配所有 Scroll便利資料 每次讀取1000條 while迴圈中 會重新拉取資料 大資料建議用Scroll
QueryBuilder qb = matchAllQuery();
SearchResponse response = client.prepareSearch("hotelindex")
.setTypes("poidata")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setFrom(0)
.setSize(50)
.execute()
.actionGet();
long count =response.getHits().getTotalHits();
while (true) {
for (SearchHit hit : response.getHits().getHits()) {
client.prepareDelete(hit.getIndex(),hit.getType(),hit.getId()).get();
}
try {
response = client.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
//Break condition: No hits are returned
if (response.getHits().getHits().length == 0) {
break;
}
}catch (Exception e){
e.printStackTrace();
}
}
}
}
搜尋區別-
//查詢條件 在匹配文字的時候一定用matchQuery termQuery用於精確匹配匹配數字long型term查詢不會分詞
match_query :全文搜尋 首先分析單詞
term_query:精確查詢-不分析單詞
Mapings:
建立欄位對映多種資料型別
注意 已經存在的索引不能夠重新被對映
索引的幾種建立方式
需要原始碼的請加技術群:468246651