1. 程式人生 > >大資料Druid部署、Push資料攝入例項

大資料Druid部署、Push資料攝入例項

Druid 單機部署

有很多文章都介紹了Druid,大資料實時分析,在此我就不多說了。本文主要描述如何部署Druid的環境,Imply提供了一套完整的部署方式,包括依賴庫,Druid,圖形化的資料展示頁面,SQL查詢元件等,Push攝入資料Tranquility Server配置。

一、環境安裝前準備:

  1. Java 8 https://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz
  2. Node.js 4.5.x
  3. Linux, Mac OS X (不支援 Windows )
  4. At least 4GB of RAM

二、安裝JAVA 8 :

  1. 新增 Java 目錄 mkdir /usr/local/java
  2. 解壓JDK tar -zxvf jdk-8u191-linux-x64.tar.gz
  3. 配置環境變數
# JAVA_HOME
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME
/bin
  1. 環境變數需要重啟生效 source /ect/profile
  2. 驗證JDK java -version

三、Node.js 安裝

1、去官網下載和自己系統匹配的檔案:

英文網址:https://nodejs.org/en/download/

中文網址:http://nodejs.cn/download/

2、下載下來的tar檔案上傳到伺服器並且解壓,然後通過建立軟連線變為全域性;

1)上傳伺服器可以是自己任意路徑,目前我的放置路徑為 cd /usr/local/software

2)解壓上傳 tar -xvf node-v10.13.0-linux-x64.tar.xz

3)建立軟連線,變為全域性

  • ln -s /usr/local/software/node-v10.13.0-linux-x64/bin/npm /usr/local/bin/
  • ln -s /usr/local/software/node-v10.13.0-linux-x64/bin/node /usr/local/bin/

4)最後一步檢驗nodejs是否已變為全域性 node -v 說明安裝成功。
在這裡插入圖片描述

三、下載與安裝 imply

  1. https://imply.io/get-started 下載最新版本安裝包
  2. tar -zxvf imply-2.7.12.tar.gz
  3. cd imply-2.7.12
  4. 啟動專案 nohup bin/supervise -c conf/supervise/quickstart.conf > quickstart.log &
    在這裡插入圖片描述
  5. 如果啟動出現上圖 請重新安裝 perl Centos7 下面執行:yum install perl
  6. 重新啟動就好了
    在這裡插入圖片描述

安裝驗證

** 匯入測試資料、安裝包中包含一些測試的資料,可以通過執行預先定義好的資料說明檔案進行匯入 **

# 匯入資料,進入  imply-2.7.12 執行下面語句
[[email protected] imply-2.7.12]# bin/post-index-task --file quickstart/wikipedia-index.json 
Beginning indexing data for wikipedia
Task started: index_wikipedia_2018-11-22T07:39:13.068Z
Task log:     http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-11-22T07:39:13.068Z/log
Task status:  http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-11-22T07:39:13.068Z/status
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task index_wikipedia_2018-11-22T07:39:13.068Z still running...
Task finished with status: SUCCESS
Completed indexing data for wikipedia. Now loading indexed data onto the cluster...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia is 0.0% finished loading...
wikipedia loading complete! You may now query your data
[[email protected] imply-2.7.12]# 

四、視覺化控制檯

在這裡插入圖片描述

五、Druid的資料攝入主要包括兩大類:

1. 實時輸入攝入:包括Pull,Push兩種

  • Pull:需要啟動一個RealtimeNode節點,通過不同的Firehose攝取不同種類的資料來源。
  • Push:需要啟動Tranquility或是Kafka索引服務。通過HTTP呼叫的方式進行資料攝入

2. 實時資料攝入

2.1 Pull

由於Realtime Node 沒有提供高可用,可伸縮等特性,對於比較重要的場景推薦使用 Tranquility Server or 或是Tranquility Kafka索引服務

2.2 Push

通過Tranquility 的資料攝入,可以分為兩種方式
Tranquility Server:傳送方可以通過Tranquility Server 提供的HTTP介面,向Druid傳送資料。
Tranquility Kafka:傳送發可以先將資料傳送到Kafka,Tranquility Kafka會根據配置從Kafka獲取資料,並寫到Druid中。

2.2.1 Tranquility Server配置:

開啟Tranquility Server,在資料節點上編輯conf/supervise/quickstart.conf 檔案,將Tranquility Server註釋放開

[[email protected] imply-2.7.12]# cd conf/supervise/
[[email protected] supervise]# ls
data.conf  master-no-zk.conf  master-with-zk.conf  query.conf  quickstart.conf
[[email protected] supervise]# vi quickstart.conf 

:verify bin/verify-java
:verify bin/verify-default-ports
:verify bin/verify-version-check
:kill-timeout 10

!p10 zk bin/run-zk conf-quickstart
coordinator bin/run-druid coordinator conf-quickstart
broker bin/run-druid broker conf-quickstart
historical bin/run-druid historical conf-quickstart
!p80 overlord bin/run-druid overlord conf-quickstart
!p90 middleManager bin/run-druid middleManager conf-quickstart
imply-ui bin/run-imply-ui-quickstart conf-quickstart

# Uncomment to use Tranquility Server  把此處的註釋去掉的
!p95 tranquility-server bin/tranquility server -configFile conf-quickstart/tranquility/server.json

# Uncomment to use Tranquility Kafka
#!p95 tranquility-kafka bin/tranquility kafka -configFile conf-quickstart/tranquility/kafka.json

# Uncomment to use Tranquility Clarity metrics server
#!p95 tranquility-metrics-server java -Xms2g -Xmx2g -cp "dist/tranquility/lib/*:dist/tranquility/conf" com.metamx.tranquility.distribution.DistributionMain server -configFile conf-quickstart/tranquility/server-for-metrics.yaml
:wq!

2.2.2 檢視 conf-quickstart/tranquility/server.json

{
  "dataSources" : [
    {
      "spec" : {
        "dataSchema" : {
          "dataSource" : "tutorial-tranquility-server",
          "parser" : {
            "type" : "string",
            "parseSpec" : {
              "timestampSpec" : {
                "column" : "timestamp",
                "format" : "auto"
              },
              "dimensionsSpec" : {
                "dimensions" : [],
                "dimensionExclusions" : [
                  "timestamp",
                  "value"
                ]
              },
              "format" : "json"
            }
          },
          "granularitySpec" : {
            "type" : "uniform",
            "segmentGranularity" : "hour",
            "queryGranularity" : "none"
          },
          "metricsSpec" : [
            {
              "type" : "count",
              "name" : "count"
            },
            {
              "name" : "value_sum",
              "type" : "doubleSum",
              "fieldName" : "value"
            },
            {
              "fieldName" : "value",
              "name" : "value_min",
              "type" : "doubleMin"
            },
            {
              "type" : "doubleMax",
              "name" : "value_max",
              "fieldName" : "value"
            }
          ]
        },
        "ioConfig" : {
          "type" : "realtime"
        },
        "tuningConfig" : {
          "type" : "realtime",
          "maxRowsInMemory" : "50000",
          "intermediatePersistPeriod" : "PT10M",
          "windowPeriod" : "PT10M"
        }
      },
      "properties" : {
        "task.partitions" : "1",
        "task.replicants" : "1"
      }
    }
  ],
  "properties" : {
    "zookeeper.connect" : "localhost",
    "druid.discovery.curator.path" : "/druid/discovery",
    "druid.selectors.indexing.serviceName" : "druid/overlord",
    "http.port" : "8200",
    "http.threads" : "40",
    "serialization.format" : "smile",
    "druidBeam.taskLocator": "overlord"
  }
}
  • “dataSource” : “tutorial-tranquility-server” 可以改成自己需要的 dataSource

2.2.3. 重新啟動專案,首先要down 掉上次啟動程式

[[email protected] imply-2.7.12]# bin/service --down
[[email protected] imply-2.7.12]# nohup bin/supervise -c conf/supervise/quickstart.conf > quickstart.log &

出現以下資訊,證明啟動成功

[[email protected] imply-2.7.12]# tail -f quickstart.log 
[Thu Nov 22 16:05:20 2018] Running command[zk], logging to[/usr/local/druid/imply-2.7.12/var/sv/zk.log]: bin/run-zk conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[coordinator], logging to[/usr/local/druid/imply-2.7.12/var/sv/coordinator.log]: bin/run-druid coordinator conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[broker], logging to[/usr/local/druid/imply-2.7.12/var/sv/broker.log]: bin/run-druid broker conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[historical], logging to[/usr/local/druid/imply-2.7.12/var/sv/historical.log]: bin/run-druid historical conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[overlord], logging to[/usr/local/druid/imply-2.7.12/var/sv/overlord.log]: bin/run-druid overlord conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[middleManager], logging to[/usr/local/druid/imply-2.7.12/var/sv/middleManager.log]: bin/run-druid middleManager conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[imply-ui], logging to[/usr/local/druid/imply-2.7.12/var/sv/imply-ui.log]: bin/run-imply-ui-quickstart conf-quickstart
[Thu Nov 22 16:05:20 2018] Running command[tranquility-server], logging to[/usr/local/druid/imply-2.7.12/var/sv/tranquility-server.log]: bin/tranquility server -configFile conf-quickstart/tranquility/server.json

2.2.4. 進行測試類編寫

# HTTP util
import java.io.IOException;
import java.net.SocketTimeoutException;
import java.security.GeneralSecurityException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLException;
import javax.net.ssl.SSLSession;
import javax.net.ssl.SSLSocket;

import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.config.RequestConfig.Builder;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.conn.ConnectTimeoutException;
import org.apache.http.conn.ssl.SSLConnectionSocketFactory;
import org.apache.http.conn.ssl.SSLContextBuilder;
import org.apache.http.conn.ssl.TrustStrategy;
import org.apache.http.conn.ssl.X509HostnameVerifier;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient