1. 程式人生 > >cdh5.14.2中安裝,啟動,測試flume

cdh5.14.2中安裝,啟動,測試flume

說明:此文特為初次使用cdh上flume,並且對flume有一定認識的同學參考使用,具體請參考官網:
Apache Flume™
環境:centos7.3 1708 ,cdh 5.14.2

1. 在cdh中新增flume服務

看圖:
圖一
這裡寫圖片描述
圖二
這裡寫圖片描述
圖三
這裡寫圖片描述
圖四
這裡寫圖片描述
圖片五
這裡寫圖片描述
圖片六
在這裡啟動一下flume
這裡寫圖片描述
圖片七
這裡寫圖片描述
圖片八
這裡寫圖片描述

2.使用預設配置測試flume正常執行

預設配置檔案配置了以netcat(網路列印輸出)作為source,以記憶體memery作為channel,以logger作為sink輸出到日誌檔案中的一個簡單樣例配置。
配置如下(如果是做flume的安裝測試,無需改動該配置):

# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = source1
tier1.channels = channel1
tier1.sinks    = sink1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type     = netcat
tier1.sources
.source1.bind = 127.0.0.1 tier1.sources.source1.port = 9999 tier1.sources.source1.channels = channel1 tier1.channels.channel1.type = memory tier1.sinks.sink1.type = logger tier1.sinks.sink1.channel = channel1 # Other properties are specific to each type of # source, channel, or sink. In this case, we
# specify the capacity of the memory channel. tier1.channels.channel1.capacity = 100

agent的名字是tier1
source是source1
channel是channel1
sink是sink1

source的型別是netcat(來自網路的螢幕輸出)
監聽的網路地址是127.0.0.1本地
監聽埠是 9999

source輸出給channel1
使用memory作為channel1
channel1輸出給sink1
sink1的型別是logger(日誌)
最後一行是規定channel1每次的快取能力是100

到這裡,一切準備就緒了

3.

下面開始測試:
在cdh04機器中,(也是上述安裝了flume,和作了配置的機器),使用telnet工具連線到127.0.0.1(或則localhost) 9999埠(上述配置中source繫結的監聽埠)【如果沒有安裝telnet,參考後面的telnet安裝說明】
telnet localhost 9999
使用telnet連線到localhost本主機
出現Escape character is ‘^]’.後說明連線就緒
我們隨意傳送一些東西:
HELLO——————
回車
如下:

telnet localhost 9999

Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HELLO------------------
OK

4. 檢視經過flume採集到日誌中的情況:

日誌位置:
這裡寫圖片描述
找到此位置,tail -100 flume-cmf-flume-AGENT-cdh04.log
找到
這裡寫圖片描述

2018-08-16 14:21:05,100 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 started
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Sink sink1
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Source source1
2018-08-16 14:21:05,601 INFO org.apache.flume.source.NetcatSource: Source starting
2018-08-16 14:21:05,602 INFO org.apache.flume.source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:9999]
2018-08-16 14:21:05,603 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2018-08-16 14:21:05,604 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:41414
2018-08-16 16:03:25,948 INFO org.apache.flume.sink.LoggerSink: Event: { headers:{} body: 48 45 4C 4C 4F 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D HELLO----------- }

至此說明flume安裝沒問題了,可以使用了。

5. 安裝telnet

sudo yum -y install telnet-0.17-64.el7.x86_64

6. 將netcat資料通過flume採集到hdfs

按照如下配置修改flume的配置檔案即可

# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = source1
#tier1.sources  = avro-source1
tier1.channels = channel1
tier1.sinks    = sink1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type     = netcat
tier1.sources.source1.bind     = 127.0.0.1
tier1.sources.source1.port     = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type   = memory



# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#tier1.sources.avro-source1.channels = ch1
#tier1.sources.avro-source1.type = avro
#tier1.sources.avro-source1.bind = 0.0.0.0
#tier1.sources.avro-source1.port = 41414
#tier1.sources.avro-source1.threads = 5

#define source monitor a file
#tier1.sources.avro-source1.type = exec
#tier1.sources.avro-source1.shell = /bin/bash -c
#tier1.sources.avro-source1.command = tail -n +0 -F cdh03:/home/d2
#tier1.sources.avro-source1.channels = channel1
#tier1.sources.avro-source1.threads = 5




# tier1.sinks.sink1.type         = hdfs
tier1.sinks.sink1.channel      = channel1

# Describe the sink
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /flume/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.filePrefix=test_flume
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.rollInterval=0


# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100
  • 提示:tier1.sinks.sink1.hdfs.path = /flume/這句指定了資料存放到hdfs中的位置,但這裡並沒有帶’hdfs://’這個schame,是因為,在cdh中配置的flume會自動識別配置hdfs的這個schame。當然你加上也不會錯。