1. 程式人生 > >【慕課網實戰】Spark Streaming實時流處理項目實戰筆記三之銘文升級版

【慕課網實戰】Spark Streaming實時流處理項目實戰筆記三之銘文升級版

聚集 配置文件 ssi path fig rect 擴展 str 控制臺

銘文一級:

Flume概述
Flume is a distributed, reliable,
and available service for efficiently collecting(收集),
aggregating(聚合), and moving(移動) large amounts of log data


webserver(源端) ===> flume ===> hdfs(目的地)


設計目標:
可靠性
擴展性
管理性


業界同類產品的對比
(***)Flume: Cloudera/Apache Java
Scribe: Facebook C/C++ 不再維護
Chukwa: Yahoo/Apache Java 不再維護
Kafka:
Fluentd: Ruby
(***)Logstash: ELK(ElasticSearch,Kibana)


Flume發展史
Cloudera 0.9.2 Flume-OG
flume-728 Flume-NG ==> Apache
2012.7 1.0
2015.5 1.6 (*** + )
~ 1.7


Flume架構及核心組件
1) Source 收集

2) Channel 聚集

3) Sink 輸出


Flume安裝前置條件
Java Runtime Environment - Java 1.7 or later
Memory - Sufficient memory for configurations used by sources, channels or sinks
Disk Space - Sufficient disk space for configurations used by channels or sinks
Directory Permissions - Read/Write permissions for directories used by agent


安裝jdk
下載
解壓到~/app
將java配置系統環境變量中: ~/.bash_profile
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH
source下讓其配置生效
檢測: java -version


安裝Flume
下載
解壓到~/app
將java配置系統環境變量中: ~/.bash_profile
export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin
export PATH=$FLUME_HOME/bin:$PATH
source下讓其配置生效
flume-env.sh的配置:export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
檢測: flume-ng version


example.conf: A single-node Flume configuration

使用Flume的關鍵就是寫配置文件

A) 配置Source
B) 配置Channel
C) 配置Sink
D) 把以上三個組件串起來

a1: agent名稱
r1: source的名稱
k1: sink的名稱
c1: channel的名稱

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop000
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1


啟動agent
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console

使用telnet進行測試: telnet hadoop000 44444


Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. }
Event是FLume數據傳輸的基本單元
Event = 可選的header + byte array

銘文二級:

Flume設計目標:可靠性,擴展性,管理性

官網:flume.apache.org -> Documentation(左欄目) -> Flume User Guide

左欄為目錄,較常用的有:

Flume Sources:avro、exec、kafka、netcat

Flume Channels:memory、file、kafka

Flume Sinks:HDFS、Hive、logger、avro、ElasticSearch、Hbase、kafka

註意:每個source、channel、sink都有custom自定義類型

Setting multi-agent flow

技術分享圖片

Consolidation

技術分享圖片

Multiplexing the flow

技術分享圖片

實戰準備=>

1.前置要求為以上銘文一4點,Flume的下載可以在cdh5裏wget下來

2.安裝jdk,指令:tar -zxvf * -C ~/app/ ,最後勿忘:source ~/.bash_profile

配置cp flume-env.sh flume-env.sh.template,export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144

3.檢測是否安裝成功:flume-ng version

實戰步驟=>

實戰需求:從指定的網絡端口采集數據輸出到控制臺

配置文件(創建example.conf於conf文件夾中,主要是看官網!):

1、a1.後面的source、channel、sink、均有"s"

2、後面連接是,sources後面的channel有"s",sink後面的chanel無"s"

啟動agent=>
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console

啟動另一終端ssh上,使用telnet進行測試: telnet hadoop000 44444

【慕課網實戰】Spark Streaming實時流處理項目實戰筆記三之銘文升級版