1. 程式人生 > >ElasticSearch實戰二(es基本操作以及IK分詞器的安裝)

ElasticSearch實戰二(es基本操作以及IK分詞器的安裝)

1 基本概念

1.1 Node 與 Cluster
Elastic 本質上是一個分散式資料庫,允許多臺伺服器協同工作,每臺伺服器可以執行多個 Elastic 例項。

單個 Elastic 例項稱為一個節點(node)。一組節點構成一個叢集(cluster)。

1.2 Index
Elastic 會索引所有欄位,經過處理後寫入一個反向索引(Inverted Index)。查詢資料的時候,直接查詢該索引。

所以,Elastic 資料管理的頂層單位就叫做 Index(索引)。它是單個數據庫的同義詞。每個 Index (即資料庫)的名字必須是小寫。

1.3  Document

Index 裡面單條的記錄稱為 Document(文件)。許多條 Document 構成了一個 Index。

1.4 Type

Document 可以分組,比如weather這個 Index 裡面,可以按城市分組(北京和上海),也可以按氣候分組(晴天和雨天)。這種分組就叫做 Type,它是虛擬的邏輯分組,用來過濾 Document。

    對比關係型資料庫而言,index相當於關係型資料庫中的資料庫;document相當於表。而Type相當於列資料。在前文中,我們安裝了Kibana工具,這裡的演示全部在kibana中進行操作。

1.5 建立索引

語法:PUT ip:port/<索引名稱>/<文件名稱>

# 建立索引
PUT /es/emp/1
{
  "id":1,
  "name":"lucy",
  "hobbys":["go","eat"]
}

索引名稱為es,文件為emp.後面的為id=1

獲取資料   GET /es/emp/1

{
  "_index": "es",
  "_type": "emp",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "id": 1,
    "name": "lucy",
    "hobbys": [
      "go",
      "eat"
    ]
  }
}

從上圖可以看出es中對資料儲存的格式:元資料+文件

元資料:

_index

索引名稱

_type

文件型別

_id

元資料id

_source

文件資料

es是基於樂觀鎖控制資料的----通過版本號區分.上面有_version欄位。就是這個意思。

上述的id是元資料的id,根據元資料 的id獲取資料,而不是文件內容的id.

使用POST可以自動由系統產生id

POST /es/emp/
{
  "id":2,
  "name":"johy",
  "hobbys":["swimming","eat"]
}

響應結果:

{
  "_index": "es",
  "_type": "emp",
  "_id": "6ke8m2YB0gh-mNfcBmiv",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

獲取資料:

GET /es/emp/6ke8m2YB0gh-mNfcBmiv

{
  "_index": "es",
  "_type": "emp",
  "_id": "6ke8m2YB0gh-mNfcBmiv",
  "_version": 1,
  "found": true,
  "_source": {
    "id": 2,
    "name": "johy",
    "hobbys": [
      "swimming",
      "eat"
    ]
  }
}

#獲取部分獲欄位

GET /es/emp/6ke8m2YB0gh-mNfcBmiv?_source=id,name

{
  "_index": "es",
  "_type": "emp",
  "_id": "6ke8m2YB0gh-mNfcBmiv",
  "_version": 1,
  "found": true,
  "_source": {
    "name": "johy",
    "id": 2
  }
}

#不需要不需要元資料

GET /es/emp/6ke8m2YB0gh-mNfcBmiv/_source

{
  "id": 2,
  "name": "johy",
  "hobbys": [
    "swimming",
    "eat"
  ]
}

文件更新:

我們先來獲取整個資料:

GET /es/emp/6ke8m2YB0gh-mNfcBmiv

{
  "_index": "es",
  "_type": "emp",
  "_id": "6ke8m2YB0gh-mNfcBmiv",
  "_version": 1,
  "found": true,
  "_source": {
    "id": 2,
    "name": "johy",
    "hobbys": [
      "swimming",
      "eat"
    ]
  }
}

修改:name為kumi,新增sex欄位:

# 更新文件
PUT /es/emp/6ke8m2YB0gh-mNfcBmiv/
{
  "name":"kumi",
  "sex":"man"
}

再次獲取:

{
  "_index": "es",
  "_type": "emp",
  "_id": "6ke8m2YB0gh-mNfcBmiv",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "kumi",
    "sex": "man"
  }
}

更新文件其實是把之前的資料刪除了,版本號+1,_id不變新增新資料

區域性更新文件:

#區域性更新文件
POST /es/emp/6ke8m2YB0gh-mNfcBmiv/_update
{
  "doc":
  {
    "name":"kumi",
    "sex":"man",
    "age":18
  }
}
{
	"_index": "es",
	"_type": "emp",
	"_id": "6ke8m2YB0gh-mNfcBmiv",
	"_version": 4,
	"found": true,
	"_source": {
		"doc": {
			"name": "kumi",
			"sex": "man",
			"age": 18
		},
		"sex": "man",
		"name": "kumi",
		"age": 18
	}
}

#指令碼更新資料

POST /es/emp/6ke8m2YB0gh-mNfcBmiv/_update

{

    "script" : "ctx._source.age += 5"

}
{
  "_index": "es",
  "_type": "emp",
  "_id": "6ke8m2YB0gh-mNfcBmiv",
  "_version": 5,
  "found": true,
  "_source": {
    "doc": {
      "name": "kumi",
      "sex": "man",
      "age": 18
    },
    "sex": "man",
    "name": "kumi",
    "age": 23
  }
}

 

# 刪除文件

DELETE /es/emp/1

 

2 ik分詞器安裝

下載檔案: https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip

在D:\devs\es\elasticsearch-6.0.0\plugins新建ik資料夾,然後解壓縮:

重啟ES.重新開啟kibana:

POST _analyze
{
  "analyzer":"ik_smart",
  "text":"WWW是覆蓋全球的客戶機/伺服器網路;當用網際網路接入WWW時,使用者的計算機就等於一臺客戶機;通過WWW使用者能夠和各種不同型別的計算機之間實現有效的通訊。"
}

分詞結果如下:

{
  "tokens": [
    {
      "token": "www",
      "start_offset": 0,
      "end_offset": 3,
      "type": "ENGLISH",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 3,
      "end_offset": 4,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "覆蓋",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "全球",
      "start_offset": 6,
      "end_offset": 8,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "的",
      "start_offset": 8,
      "end_offset": 9,
      "type": "CN_CHAR",
      "position": 4
    },
    {
      "token": "客戶機",
      "start_offset": 9,
      "end_offset": 12,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "伺服器",
      "start_offset": 13,
      "end_offset": 16,
      "type": "CN_WORD",
      "position": 6
    },
    {
      "token": "網路",
      "start_offset": 16,
      "end_offset": 18,
      "type": "CN_WORD",
      "position": 7
    },
    {
      "token": "當用",
      "start_offset": 19,
      "end_offset": 21,
      "type": "CN_WORD",
      "position": 8
    },
    {
      "token": "網際網路",
      "start_offset": 21,
      "end_offset": 24,
      "type": "CN_WORD",
      "position": 9
    },
    {
      "token": "接入",
      "start_offset": 24,
      "end_offset": 26,
      "type": "CN_WORD",
      "position": 10
    },
    {
      "token": "www",
      "start_offset": 26,
      "end_offset": 29,
      "type": "ENGLISH",
      "position": 11
    },
    {
      "token": "時",
      "start_offset": 29,
      "end_offset": 30,
      "type": "CN_CHAR",
      "position": 12
    },
    {
      "token": "使用者",
      "start_offset": 31,
      "end_offset": 33,
      "type": "CN_WORD",
      "position": 13
    },
    {
      "token": "的",
      "start_offset": 33,
      "end_offset": 34,
      "type": "CN_CHAR",
      "position": 14
    },
    {
      "token": "計算機",
      "start_offset": 34,
      "end_offset": 37,
      "type": "CN_WORD",
      "position": 15
    },
    {
      "token": "就",
      "start_offset": 37,
      "end_offset": 38,
      "type": "CN_CHAR",
      "position": 16
    },
    {
      "token": "等於",
      "start_offset": 38,
      "end_offset": 40,
      "type": "CN_WORD",
      "position": 17
    },
    {
      "token": "一臺",
      "start_offset": 40,
      "end_offset": 42,
      "type": "CN_WORD",
      "position": 18
    },
    {
      "token": "客戶機",
      "start_offset": 42,
      "end_offset": 45,
      "type": "CN_WORD",
      "position": 19
    },
    {
      "token": "通過",
      "start_offset": 46,
      "end_offset": 48,
      "type": "CN_WORD",
      "position": 20
    },
    {
      "token": "www",
      "start_offset": 48,
      "end_offset": 51,
      "type": "ENGLISH",
      "position": 21
    },
    {
      "token": "使用者",
      "start_offset": 51,
      "end_offset": 53,
      "type": "CN_WORD",
      "position": 22
    },
    {
      "token": "能夠",
      "start_offset": 53,
      "end_offset": 55,
      "type": "CN_WORD",
      "position": 23
    },
    {
      "token": "和",
      "start_offset": 55,
      "end_offset": 56,
      "type": "CN_CHAR",
      "position": 24
    },
    {
      "token": "各種不同型別",
      "start_offset": 56,
      "end_offset": 62,
      "type": "CN_WORD",
      "position": 25
    },
    {
      "token": "的",
      "start_offset": 62,
      "end_offset": 63,
      "type": "CN_CHAR",
      "position": 26
    },
    {
      "token": "計算機",
      "start_offset": 63,
      "end_offset": 66,
      "type": "CN_WORD",
      "position": 27
    },
    {
      "token": "之間",
      "start_offset": 66,
      "end_offset": 68,
      "type": "CN_WORD",
      "position": 28
    },
    {
      "token": "實現",
      "start_offset": 68,
      "end_offset": 70,
      "type": "CN_WORD",
      "position": 29
    },
    {
      "token": "有效",
      "start_offset": 70,
      "end_offset": 72,
      "type": "CN_WORD",
      "position": 30
    },
    {
      "token": "的",
      "start_offset": 72,
      "end_offset": 73,
      "type": "CN_CHAR",
      "position": 31
    },
    {
      "token": "通訊",
      "start_offset": 73,
      "end_offset": 75,
      "type": "CN_WORD",
      "position": 32
    }
  ]
}

注意:IK分詞器有兩種型別,分別是ik_smart分詞器和ik_max_word分詞器。

ik_smart: 會做最粗粒度的拆分,比如會將“中華人民共和國國歌”拆分為“中華人民共和國,國歌”。

ik_max_word: 會將文字做最細粒度的拆分,比如會將“中華人民共和國國歌”拆分為“中華人民共和國,中華人民,中華,華人,人民共和國,人民,人,民,共和國,共和,和,國國,國歌”,會窮盡各種可能的組合;