1. 程式人生 > >Flume跨伺服器實時採集資料

Flume跨伺服器實時採集資料

整體架構如下圖,有兩臺伺服器,在伺服器之間傳輸一般用avro 或者Thrift比較多,這裡選擇avro source和sink:

一、Flume配置

1.在A伺服器新建aserver.conf

#伺服器A(192.168.116.10)
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 配置監控檔案
a1.sources.r1.type = exec
a1.sources.r1.command =tail -F /usr/tmp/flume/1.log
a1.sources.r1.shell = /bin/sh -c
# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=192.168.116.11
a1.sinks.k1.port = 44444
# 配置channel
a1.channels.c1.type = memory
# 將三者串聯
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.在B伺服器新建bserver.conf

#伺服器B(192.168.116.11)
b1.sources = r2
b1.sinks = k2
b1.channels = c2
# 配置監控檔案
b1.sources.r2.type = avro
b1.sources.r2.bind=192.168.116.11
b1.sources.r2.port = 44444
#b1.sources.r2.interceptors = i1
#b1.sources.r2.interceptors.i1.type = timestamp
# 配置sink
b1.sinks.k2.type =logger
# 配置channel
b1.channels.c2.type = memory
# 將三者串聯
b1.sources.r2.channels = c2
b1.sinks.k2.channel = c2
~                        

二、測試

1.先啟動bserver.conf

flume-ng agent -n b1 -c /usr/local/src/apache-flume-1.6.0-bin/conf -f /usr/local/src/apache-flume-1.6.0-bin/conf/bserver.conf -Dflume.root.logger=INFO,console

2.再啟動aserver.conf

flume-ng agent -n a1 -c /usr/local/src/apache-flume-1.6.0-bin/conf -f /usr/local/src/apache-flume-1.6.0-bin/conf/aserver.conf -Dflume.root.logger=INFO,console

往監控檔案裡面新增東西

可以看到控制檯已經監控到內容了。

把sink改成hdfs就可以採集到hdfs上了

三、踩坑說明

1.啟動順序,一定要先啟動B伺服器再啟動A伺服器

2.如果遇到報錯[ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:Avro source r1: { bindAddress: master, port: 44444 } } - Exception follows.

org.jboss.netty.channel.ChannelException: Failed to bind to: master/192.168.116.10:44444

Caused by: java.net.BindException: Cannot assign requested address

這說明你的IP地址配置錯了,要配置成B伺服器的,不是A伺服器的。

3.如果啟動成功,但是沒有監控到內容輸出,可能是flume的配置錯了,比如avro source 和avro sink 的ip配置是不一樣的,一個叫做hostname,一個叫做bind,,這個坑了我很久才注意到。