1. 程式人生 > >實時日誌收集-查詢-分析系統(Flume+ElasticSearch+Kibana)

實時日誌收集-查詢-分析系統(Flume+ElasticSearch+Kibana)

設計方案:Flume(日誌收集) + ElasticSearch(日誌查詢)+ Kibana(日誌分析與展示)

實驗使用場景:通過ambari部署集群后,可以新增自己的日誌系統,記錄每個元件的產生的日誌,實時的查詢分析。

一、Flume概述

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.

二、Flume架構

每一個Flume agent包含三種類型的元件:source(從資料來源獲取生成event data),channel(接收source給put來的event data),sink(從channel取走event data)

注意上面寫的是一個flume agent包含三種而不是三個

flume-arch

解釋下什麼是event data?

官方解釋:A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes.

簡單理解:flume event data = headers + body,其中body的型別是byte[],headers的型別是Map<String,String>,event代表著一個數據流的最小完整單元,如果是source是從文字檔案中讀資料,那event的body通常就是每行的內容,headers可以自行新增。

三、Flume需要理解的內容

  1. 如何配好一個最簡單的flume.conf,使得flume agent正常工作;

  2. Flume的flow的種類和適用場景;

  3. Flume的官方提供的sources,channels,sinks,如提供的不滿足需求,可自定義適用於自己場景的source、channel和sink;

四、ElasticSearch概述

Elasticsearch是一個基於Apache Lucene(TM)的開源的、實時的、分散式的、全文儲存、搜尋、分析引擎。

Lucene使用起來非常複雜,ES(ElasticSearch)可以看成對其進行了封裝,提供了豐富的REST API,上手非常容易。

五、ElasticSearch的資料模型的簡單理解

在Elasticsearch中,有幾個概念(關鍵詞),有別於我們使用的關係型資料庫中的概念,注意類比:

Relational DB   -> Databases        -> Tables -> Rows       -> Columns
Elasticsearch   -> Indices(Index)   -> Types  -> Documents  -> Fields

Elasticsearch叢集可以包含多個索引(indices)(資料庫),每一個索引可以包含多個型別(types)(表),每一個型別包含多個文件(documents)(行),然後每個文件包含多個欄位(Fields)(列)。

如何定位es中的一個文件(Document)?

通過Index(索引: 文件儲存的地方) + Type(型別:文件代表的物件的類)+ Document_ID(唯一標識:文件的唯一標識),在ES內部的元資料表示為:_index + _type + _id。

六、Kibana概述

可以看成是ES的一個外掛,提供的功能:

  1. Flexible analytics and visualization platform

  2. Real-time summary and charting of streaming data

  3. Intuitive interface for a variety of users

  4. Instant sharing and embedding of dashboards

七、系統實現

環境:

1)JDK版本:java -version
java version "1.7.0_75"
OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
(2)Flume1.6.03)ElasticSearch1.7.5

注意:我這裡實驗顯示Flume1.6.0不能導資料到ES2.2.0

log-arch

Flume使用的conf,可以簡單的設定如下:

agent.sources = yarnSrc hbaseSrc
agent.channels = memoryChannel
agent.sinks = elasticsearch

# source1:hdfsSrc
agent.sources.hdfsSrc.type = exec
agent.sources.hdfsSrc.command = tail -F /var/log/tbds/hdfs/hdfs/hadoop-hdfs-datanode-10.151.139.111.log
agent.sources.hdfsSrc.channels = memoryChannel

# source2:yarnSrc
agent.sources.yarnSrc.type = exec
agent.sources.yarnSrc.command = tail -F /var/log/tbds/yarn/yarn/yarn-yarn-nodemanager-10.151.139.111.log
agent.sources.yarnSrc.channels = memoryChannel

# source3:hbaseSrc
agent.sources.hbaseSrc.type = exec
agent.sources.hbaseSrc.command = tail -F /var/log/tbds/hbase/hbase-hbase-regionserver-10.151.139.111.log
agent.sources.hbaseSrc.channels = memoryChannel

# sink1:localSink
agent.sinks.localSink.type = file_roll
agent.sinks.localSink.sink.directory = /var/log/flume
agent.sinks.localSink.sink.rollInterval = 0
agent.sinks.localSink.channel = memoryChannel

# sink2:esSink
agent.sinks.elasticsearch.channel = memoryChannel
agent.sinks.elasticsearch.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticsearch.hostNames = 10.151.139.111:9300

agent.sinks.elasticsearch.indexName = basis_log_info
agent.sinks.elasticsearch.batchSize = 100
agent.sinks.elasticsearch.indexType = logs
agent.sinks.elasticsearch.clusterName = my-test-es-cluster
agent.sinks.elasticsearch.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

# channel1
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

注意要在flume/lib下加入兩個包:
lucene-core-4.10.4.jar
elasticsearch-1.7.5.jar

The elasticsearch and lucene-core jars required for your environment must be placed in the lib directory of the Apache Flume installation.

之後分別執行elasticsearch和flume即可。

八、系統改進

  1. 配置flume interceptor加入各類headers,重寫ElasticSearchLogStashEventSerializer使得event的header部分可以作為es的文件的field

  2. memory channel與file channel的結合,參見美團日誌系統的改進

  3. 日誌若是錯誤資訊,並不是每一行都是作為es的一個文件,而是若干行的內容才是es的一個文件的message

九、系統實現效果

匯入es的文件資料結構:

es-data-structure

Kibana展示:

kibana-result

References: