1. 程式人生 > >PB級資料實現秒級查詢ES的安裝

PB級資料實現秒級查詢ES的安裝

什麼是ES?ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎,基於RESTful web介面。Elasticsearch是用Java語言開發的,並作為Apache許可條款下的開放原始碼釋出,是一種流行的企業級搜尋引擎。ElasticSearch用於雲端計算中,能夠達到實時搜尋,穩定,可靠,快速,安裝使用方便。

  • 本教程使用軟體為6.4.3版本

    下載

    ElasticSearch:

    https://www.elastic.co/cn/downloads/elasticsearch

Kibana:

https://www.elastic.co/cn/downloads/kibana

上傳至Linux伺服器

解壓並修改配置檔案

# 注意:需要使用普通使用者操作ES,不可使用root使用者進行操作。

#使用root使用者,新增使用者
#新增使用者命令:
[root@hadoop137 ~]# useradd shaofei
[root@hadoop137 ~]# passwd shaofei
更改使用者 shaofei的密碼 。
新的 密碼:
無效的密碼: 過於簡單化/系統化
無效的密碼: 過於簡單
重新輸入新的 密碼:
passwd: 所有的身份驗證令牌已經成功更新。

#切換為shaofei使用者:
[root@hadoop137 ~]# su shaofei
[shaofei@hadoop137 root]$ cd
[shaofei@hadoop137 ~]$ 

# 解壓elasticsearch
[shaofei@hadoop137 softwear]$ tar -zxvf elasticsearch-6.4.3.tar.gz -C ../module/
[shaofei@hadoop137 elasticsearch-6.4.3]$ pwd
/opt/module/elasticsearch-6.4.3

# 修改配置檔案
[shaofei@hadoop137 elasticsearch-6.4.3]$ vim config/elasticsearch.yml 
# 注意修改yml檔案時,配置項後面要空格

elasticsearch.yml



# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 修改叢集名稱
cluster.name: myes
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 修改當前節點名稱——下一步分發到其他叢集時需要修改
node.name: hadoop137
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 當前節點的ip地址,我這裡配置了hosts檔案所以寫了主機名稱——下一步分發之後需要修改為對應主機的ip地址
network.host: hadoop137
#
# Set a custom port for HTTP:
# ES對外開放服務的埠號
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

#新增叢集ip地址
discovery.zen.ping.unicast.hosts: ["192.168.23.137", "192.168.23.138", "192.168.23.139"]

分發各個叢集中的其他主機

[shaofei@hadoop137 module]$ rsync -rvl elasticsearch-6.4.3 shaofei@hadoop138:`pwd`
[shaofei@hadoop137 module]$ rsync -rvl elasticsearch-6.4.3 shaofei@hadoop139:`pwd`

分別啟動叢集中的ES

[shaofei@hadoop137 elasticsearch-6.4.3]$ ./bin/elasticsearch

[shaofei@hadoop138 elasticsearch-6.4.3]$ ./bin/elasticsearch

[shaofei@hadoop139 elasticsearch-6.4.3]$ ./bin/elasticsearch

啟動中遇到的問題:

ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max number of threads [1024] for user [shaofei] is too low, increase to at least [4096]
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

對於第【1】個問題:
[root@hadoop139 shaofei]# vi /etc/security/limits.conf
末尾新增:
# End of file

* soft nofile 65536

* hard nofile 131072

* soft nproc 2048

* hard nproc 4096

對於第【2】個問題:
[root@hadoop139 shaofei]# vim /etc/security/limits.d/90-nproc.conf 
修改:
[root@hadoop139 shaofei]# cat /etc/security/limits.d/90-nproc.conf 
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.


*          soft    nproc     10240
root       soft    nproc     unlimited


# 重新登陸使用者即可

對於第【3】個問題:
vi /etc/sysctl.conf
新增下面配置:
vm.max_map_count=655360

並執行命令:
sysctl -p

java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed

# 修改elasticsearch.yml 新增一下內容

bootstrap.memory_lock: false
bootstrap.system_call_filter: false

問題解決之後重新啟動es即可。

  • 啟動成功

訪問:http://ip:9200

安裝kibana

  • 下載安裝包
  • 解壓
  • 修改配置檔案

kibana.yml


# Kibana is served by a back end server. This setting specifies the port to use.
# 配置埠
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
# 配置主機ip
server.host: "hadoop137"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
#配置elasticsearch的訪問
elasticsearch.url: "http://hadoop137:9200"

# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true

# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
#kibana.index: ".kibana"

# The default application to load.
#kibana.defaultAppId: "home"

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"

# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000

# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout

# Set the value of this setting to true to suppress all logging output.
#logging.silent: false

# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false

# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000

# The default locale. This locale can be used in certain circumstances to substitute any missing
# translations.
#i18n.defaultLocale: "en"

瀏覽器訪問: http://hadoop137:5601/ 即可驗證是否成功

相關推薦

PB資料實現查詢ES安裝

什麼是ES?ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎,基於RESTful web介面。Elasticsearch是用Java語言開發的,並作為Apache許可條款下的開放原始碼釋出,是一種流行的企業級搜尋引擎。ElasticSearch用於雲

python +ip2region IP庫地址文件實現查詢1萬不同ip歸屬地址

python +ip2region一、服務器環境介紹: 服務器硬件:4核4g內存服務器系統:centos6.9 x86_64位最小化安裝 二、環境安裝 參考地址:https://github.com/lionsoul2014/ip2region直接下載包到服務器上的/root目錄下wget https://g

阿里如何實現百萬TPS?搜尋離線大資料平臺架構解讀

什麼是搜尋離線? 一個典型的商品搜尋架構如下圖所示,本文將要重點介紹的就是下圖中的離線資料處理系統(Offline System)。 何謂離線?在阿里搜尋工程體系中我們把搜尋引擎、線上算分、SearchPlanner等ms級響應使用者請求的服務稱之為“

資料多條件組合查詢——響應解決方案

1 概述 組合查詢為多條件組合查詢,在很多場景下都有使用。購物網站中通過勾選類別、價格、銷售量範圍等屬性來對所有的商品進行篩選,篩選出滿足客戶需要的商品,這是一種典型的組合查詢。在小資料量的情況下,後臺通過簡單的sql語句便能夠快速過濾出需要的資料,但隨著資料量

每天數百億使用者行為資料,美團點評怎麼實現轉化分析?

使用者行為分析是資料分析中非常重要的一項內容,在統計活躍使用者,分析留存和轉化率,改進產品體驗、推動使用者增長等領域有重要作用。美團點評每天收集的使用者行為日誌達到數百億條,如何在海量資料集上實現對使用者行為的快速靈活分析,成為一個巨大的挑戰。為此,我們提出並實現了一套面向海

資料案例】每天數百億使用者行為資料,美團點評怎麼實現轉化分析?

6. 效果:上述方案目前在美團點評內部已經實際落地,穩定執行超過半年以上。每天的資料有幾百億條,活躍使用者達到了上億的量級,埋點屬性超過了百萬,日均查詢量幾百次,單次查詢的TP95時間小於5秒,完全能夠滿足互動式分析的預期。相比於原有sql方案,達到了3-4個數量級的效能提升。

實現毫秒和納計數的幾個API--timeGetTime、GetTickCount、QueryPerformanceCounter

Private Declare Function timeGetTime Lib "winmm.dll" () As Long Private Declare Function GetTickCount Lib "kernel32" () As Long Private Declar

MySQL千萬資料分割槽儲存及查詢優化

本文轉載自:https://www.cnblogs.com/javaIOException/p/7524945.html 作為傳統的關係型資料庫,MySQL因其體積小、速度快、總體擁有成本低受到中小企業的熱捧,但是對於大資料量(百萬級以上)的操作顯得有些力不從心,這裡我結合之前開發的一個web系

身份證、駕駛證ocr識別助力共享租車實現認證

目前,國內P2P租車領軍品牌凹凸共享租車完成了其APP版本的升級,應用了身份證、駕駛證OCR識別技術,租客身份認證開啟懶人模式,拍照自動轉文字,達到秒級驗證;車主上傳車輛開啟傻瓜模式,雲端識別車型資訊,將使用流程效率提升至業內領先水準。 網際網路的本質是提升效率,身份驗證是使用者使用AP

crontab實現的計劃任務

開啟crontab的配置檔案: [[email protected] ~]# cat /etc/crontab # Example of job definition: # .---------------- 分鐘(0 - 59) # | .

Linux下實現定時任務的兩種方案(crontab 每執行)

第一種方案,當然是寫一個後臺執行的指令碼一直迴圈,然後每次迴圈sleep一段時間。 while true ;do command sleep XX //間隔秒數 done 第二種方案,使用crontab。 我們都知道crontab的粒度最小是到分鐘,但是我們還是可以通過變

[開源]CSharpFlink(NET 5.0開發)分散式實時計算框架,PC機10萬資料計算測試說明

github地址:https://github.com/wxzz/CSharpFlinkgitee地址:https://gitee.com/wxzz/CSharpFlink 參考:[開源地址] 放棄Flink,.NET5.0開發CSharpFlink,簡要設計、部署及二次開發說明。  

使用 tke-autoscaling-placeholder 實現彈性伸縮

## 背景 當 TKE 叢集配置了節點池並啟用了彈性伸縮,在節點資源不夠時可以觸發節點的自動擴容 (自動買機器並加入叢集),但這個擴容流程需要一定的時間才能完成,在一些流量突高的場景,這個擴容速度可能會顯得太慢,影響業務。 `tke-autoscaling-placeholder` 可以用於在 TKE 上實

【 轉】百度地圖Canvas實現十萬CAD資料載入

Github上看到: https://github.com/lcosmos/map-canvas 這個實現颱風軌跡,這個資料量非常龐大,當時開啟時,看到這麼多資料載入很快,感到有點震驚,然後自己研究了一番,發現作者採用的是Canvas作為百度的自定義覆蓋層。 <!DOCTYPE html&

KUDU--查詢資料倉庫

## == Kudu 是什麼 == Kudu是ToddLipcon@Cloudera帶頭開發的儲存系統,其整體應用模式和HBase比較接近,即支援行級別的隨機讀寫,並支援批量順序檢索功能。 那既然有了HBase,為什麼還需要Kudu呢,簡單的說,就是嫌棄HBase在

PostgreSQL 百億資料 響應 正則及模糊查詢

原文: https://yq.aliyun.com/articles/7444?spm=5176.blog7549.yqblogcon1.6.2wcXO2 摘要: 正則匹配和模糊匹配通常是搜尋引擎的特長,但是如果你使用的是 PostgreSQL 資料庫照樣能實現,並

基於關係型資料庫和ES搜尋引擎,實現多源,百億資料的大資料分析方案

背景: 隨著公司各項業務的快速發展與擴張,伺服器和各種應用系統隨之而增加,同時對應用系統、伺服器的穩定性,可持續性提出了更高的要

企業實戰-KeepAlived+Redis實現主從熱備、切換

keepalived redis 楊文 最近公司生產環境需要做一個Redis+Keepalived的集群架構,分別用六個端口,實現多路復用,最終實現主從熱備、秒級切換。一、部署Redis集群首先用兩臺虛擬機模擬6個節點,一臺機器3個節點,創建出3 master、3 salve 環境。然後模擬成功,

淺談故障切換!用MHA輕松實現MySQL高可用(三)

mysql 高可用 mha MySQL復制是異步或者半同步的。當master故障時,一些slave可能並沒有收到最新的relay log,也就意味著每個slave可能處於不同的狀態。手動處理這些一致性問題是小事,因為不修復這些問題,就不能開始復制。但是手動修復這些問題,花費一個小時或更多的時間並不

centos實現兩種任務的簡單方法

實現 需要 mage 一次 logs 如果 必須 centos chm 1、通過寫shell腳本,死循環,守護進程運行 > vi /data/sec.sh #!/bin/bash while true do #寫上自已的命令