大資料Druid部署、Push資料攝入例項
Druid 單機部署
有很多文章都介紹了Druid,大資料實時分析,在此我就不多說了。本文主要描述如何部署Druid的環境,Imply提供了一套完整的部署方式,包括依賴庫,Druid,圖形化的資料展示頁面,SQL查詢元件等,Push攝入資料Tranquility Server配置。
一、環境安裝前準備:
- Java 8 https://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz
- Node.js 4.5.x
- Linux, Mac OS X (不支援 Windows )
- At least 4GB of RAM
二、安裝JAVA 8 :
- 新增 Java 目錄 mkdir /usr/local/java
- 解壓JDK tar -zxvf jdk-8u191-linux-x64.tar.gz
- 配置環境變數
# JAVA_HOME
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME /bin
- 環境變數需要重啟生效 source /ect/profile
- 驗證JDK java -version
三、Node.js 安裝
1、去官網下載和自己系統匹配的檔案:
英文網址:https://nodejs.org/en/download/
中文網址:http://nodejs.cn/download/
2、下載下來的tar檔案上傳到伺服器並且解壓,然後通過建立軟連線變為全域性;
1)上傳伺服器可以是自己任意路徑,目前我的放置路徑為 cd /usr/local/software
2)解壓上傳 tar -xvf node-v10.13.0-linux-x64.tar.xz
3)建立軟連線,變為全域性
- ln -s /usr/local/software/node-v10.13.0-linux-x64/bin/npm /usr/local/bin/
- ln -s /usr/local/software/node-v10.13.0-linux-x64/bin/node /usr/local/bin/
4)最後一步檢驗nodejs是否已變為全域性 node -v 說明安裝成功。
三、下載與安裝 imply
- 從 https://imply.io/get-started 下載最新版本安裝包
- tar -zxvf imply-2.7.12.tar.gz
- cd imply-2.7.12
- 啟動專案 nohup bin/supervise -c conf/supervise/quickstart.conf > quickstart.log &
- 如果啟動出現上圖 請重新安裝 perl Centos7 下面執行:yum install perl
- 重新啟動就好了
安裝驗證
** 匯入測試資料、安裝包中包含一些測試的資料,可以通過執行預先定義好的資料說明檔案進行匯入 **
# 匯入資料,進入 imply-2.7.12 執行下面語句
[[email protected] imply-2.7.12]# bin/post-index-task --file quickstart/wikipedia-index.json
Beginning indexing data for wikipedia
Task started: index_wikipedia_2018-11-22T07:39:13.068Z
Task log: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-11-22T07:39:13.068Z/log
Task status: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-11-22T07:39:13.068Z/status
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task finished with status: SUCCESS
Completed indexing data for wikipedia. Now loading indexed data onto the cluster...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia loading complete! You may now query your data
[[email protected] imply-2.7.12]#
四、視覺化控制檯
-
overlord 控制頁面:http://192.168.164.136:8090/console.html
-
druid叢集頁面:http://192.168.164.136:8081
-
資料視覺化頁面:http://192.168.164.136:9095
-
資料查詢
五、Druid的資料攝入主要包括兩大類:
1. 實時輸入攝入:包括Pull,Push兩種
- Pull:需要啟動一個RealtimeNode節點,通過不同的Firehose攝取不同種類的資料來源。
- Push:需要啟動Tranquility或是Kafka索引服務。通過HTTP呼叫的方式進行資料攝入
2. 實時資料攝入
2.1 Pull
由於Realtime Node 沒有提供高可用,可伸縮等特性,對於比較重要的場景推薦使用 Tranquility Server or 或是Tranquility Kafka索引服務
2.2 Push
通過Tranquility 的資料攝入,可以分為兩種方式
Tranquility Server:傳送方可以通過Tranquility Server 提供的HTTP介面,向Druid傳送資料。
Tranquility Kafka:傳送發可以先將資料傳送到Kafka,Tranquility Kafka會根據配置從Kafka獲取資料,並寫到Druid中。
2.2.1 Tranquility Server配置:
開啟Tranquility Server,在資料節點上編輯conf/supervise/quickstart.conf 檔案,將Tranquility Server註釋放開
[[email protected] imply-2.7.12]# cd conf/supervise/
[[email protected] supervise]# ls
data.conf master-no-zk.conf master-with-zk.conf query.conf quickstart.conf
[[email protected] supervise]# vi quickstart.conf
:verify bin/verify-java
:verify bin/verify-default-ports
:verify bin/verify-version-check
:kill-timeout 10
!p10 zk bin/run-zk conf-quickstart
coordinator bin/run-druid coordinator conf-quickstart
broker bin/run-druid broker conf-quickstart
historical bin/run-druid historical conf-quickstart
!p80 overlord bin/run-druid overlord conf-quickstart
!p90 middleManager bin/run-druid middleManager conf-quickstart
imply-ui bin/run-imply-ui-quickstart conf-quickstart
# Uncomment to use Tranquility Server 把此處的註釋去掉的
!p95 tranquility-server bin/tranquility server -configFile conf-quickstart/tranquility/server.json
# Uncomment to use Tranquility Kafka
#!p95 tranquility-kafka bin/tranquility kafka -configFile conf-quickstart/tranquility/kafka.json
# Uncomment to use Tranquility Clarity metrics server
#!p95 tranquility-metrics-server java -Xms2g -Xmx2g -cp "dist/tranquility/lib/*:dist/tranquility/conf" com.metamx.tranquility.distribution.DistributionMain server -configFile conf-quickstart/tranquility/server-for-metrics.yaml
:wq!
2.2.2 檢視 conf-quickstart/tranquility/server.json
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "tutorial-tranquility-server",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : [],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none"
},
"metricsSpec" : [
{
"type" : "count",
"name" : "count"
},
{
"name" : "value_sum",
"type" : "doubleSum",
"fieldName" : "value"
},
{
"fieldName" : "value",
"name" : "value_min",
"type" : "doubleMin"
},
{
"type" : "doubleMax",
"name" : "value_max",
"fieldName" : "value"
}
]
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "50000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "40",
"serialization.format" : "smile",
"druidBeam.taskLocator": "overlord"
}
}
- “dataSource” : “tutorial-tranquility-server” 可以改成自己需要的 dataSource
2.2.3. 重新啟動專案,首先要down 掉上次啟動程式
[[email protected] imply-2.7.12]# bin/service --down
[[email protected] imply-2.7.12]# nohup bin/supervise -c conf/supervise/quickstart.conf > quickstart.log &
出現以下資訊,證明啟動成功
[[email protected] imply-2.7.12]# tail -f quickstart.log
[Thu Nov 22 16:05:20 2018] Running command[zk], logging to[/usr/local/druid/imply-2.7.12/var/sv/zk.log]: bin/run-zk conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[coordinator], logging to[/usr/local/druid/imply-2.7.12/var/sv/coordinator.log]: bin/run-druid coordinator conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[broker], logging to[/usr/local/druid/imply-2.7.12/var/sv/broker.log]: bin/run-druid broker conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[historical], logging to[/usr/local/druid/imply-2.7.12/var/sv/historical.log]: bin/run-druid historical conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[overlord], logging to[/usr/local/druid/imply-2.7.12/var/sv/overlord.log]: bin/run-druid overlord conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[middleManager], logging to[/usr/local/druid/imply-2.7.12/var/sv/middleManager.log]: bin/run-druid middleManager conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[imply-ui], logging to[/usr/local/druid/imply-2.7.12/var/sv/imply-ui.log]: bin/run-imply-ui-quickstart conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[tranquility-server], logging to[/usr/local/druid/imply-2.7.12/var/sv/tranquility-server.log]: bin/tranquility server -configFile conf-quickstart/tranquility/server.json
2.2.4. 進行測試類編寫
# HTTP util
import java.io.IOException;
import java.net.SocketTimeoutException;
import java.security.GeneralSecurityException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLException;
import javax.net.ssl.SSLSession;
import javax.net.ssl.SSLSocket;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.config.RequestConfig.Builder;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.conn.ConnectTimeoutException;
import org.apache.http.conn.ssl.SSLConnectionSocketFactory;
import org.apache.http.conn.ssl.SSLContextBuilder;
import org.apache.http.conn.ssl.TrustStrategy;
import org.apache.http.conn.ssl.X509HostnameVerifier;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient