Flume-ng-1.4.0安裝及執行遇到問題總結
2、解壓安裝包
tar -zxvf apache-flume-1.4.0-bin.tar.gz
3、配置環境變數
export FLUME_HOME=/root/install/apache-flume-1.4.0-bin
export PATH=$PATH:$FLUME_HOME/bin
4、讓配置檔案生效
source /etc/profile
5、編寫一個測試案例
(1)在$FLUME_HOME/conf/目錄下新建檔案example-conf.properties,其內容如下
(2)在conf目錄下新建一個原始檔file-test.txt並向其中寫入資料<span style="font-size:18px;"><span style="font-size:18px;"># Describe the source a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1</span></span>
<span style="font-size:18px;"><span style="font-size:18px;">echo "hello world" >file-test.txt</span></span>
(3)啟動agent代理<span style="font-size:18px;"><span style="font-size:18px;">flume-ng agent -n a1 -f example-conf.properties</span></span>
(4)另開一個視窗啟動avro-client客戶端向agent代理髮送資料(以下的localhost目前照這個例子還不能寫成ip地址)
<span style="font-size:18px;"><span style="font-size:18px;">flume-ng avro-client -H localhost -p 44444 -F file-test.txt</span></span>
從上圖輸出結果可以看出avro-client客戶端發來的資料已經被agent代理接收到,在本例中的配置中,設定的sink型別為logger,其輸出結果會儲存在日誌中
問題 總結
1、出現如下錯誤,解決辦法:把guava-10.0.1.jar需要換成guava-11.0.2.jar
<span style="font-size:18px;">Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:517)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
</span>
2、出現如下錯誤,解決辦法:把protobuf-java-2.4.1-shaded.jar換成protobuf-java-2.5.0.jar<span style="font-size:18px;">Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:517)
</span>
3、如果Hadoop群集是HA配置需要將core-site.xml和hdfs-site.xml檔案複製到$FLUME_HOME/conf/目錄下,這才能使用dfs.nameservices4、配置動態目錄時報錯
<span style="font-size:18px;">java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null</span>
解決辦法:新增agent1.sinks.hdfssink1.hdfs.useLocalTimeStamp = true<span style="font-size:18px;">agent1.sinks.hdfssink1.hdfs.path = hdfs://mycluster/flume-data/%y-%m-%d/%H%M
agent1.sinks.hdfssink1.hdfs.filePrefix = accesslog
agent1.sinks.hdfssink1.hdfs.useLocalTimeStamp = true</span>
5、相關引數說明
(1)以下3個引數搭配動態生成目錄
<span style="font-size:18px;">agent1.sinks.sink2hdfs.hdfs.round = true
agent1.sinks.sink2hdfs.hdfs.roundValue = 60
agent1.sinks.sink2hdfs.hdfs.roundUnit = minute</span>
(2)以下3個引數搭配來確定輸出到hdfs上的資料是什麼格式(如,是否壓縮)<span style="font-size:18px;">agent1.sinks.sink2hdfs.hdfs.writeFormat = Text
agent1.sinks.sink2hdfs.hdfs.fileType = DataStream
#agent1.sinks.sink2hdfs.hdfs.codeC = gzip</span>
(3)以下的3個引數用來確定檔案輸出的目錄結構及檔名稱<span style="font-size:18px;">agent1.sinks.sink2hdfs.hdfs.path = hdfs://mycluster/flume/%Y-%m-%d/%H
agent1.sinks.sink2hdfs.hdfs.filePrefix = consolidation-accesslog-%H-%M-%S
agent1.sinks.sink2hdfs.hdfs.useLocalTimeStamp = true</span>
(4)以下引數確定輸出端檔案系統型別<span style="font-size:18px;">agent1.sinks.sink2hdfs.type = hdfs</span>
(5)關於Flume記憶體溢位的問題,此時會報各種莫名奇妙的異常,異常如下
<span style="font-size:18px;">org.apache.avro.AvroRuntimeException: Unknown datum type: java.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
14/10/17 09:56:06 WARN ipc.NettyServer: Unexpected exception from downstream.
java.lang.OutOfMemoryError: Java heap space
14/10/17 09:50:31 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
java.lang.IllegalStateException: close() called when transaction is OPEN - you must either commit or rollback first
Exception in thread "pool-17-thread-6" java.lang.NoClassDefFoundError: Could not initialize class java.text.MessageFormat
EXCEPTION: java.lang.OutOfMemoryError: GC overhead limit exceeded)
java.lang.OutOfMemoryError: GC overhead limit exceeded
Error while writing to required channel: org.apache.flume.channel.MemoryChannel{name:
AvroRuntimeException: Excessively large list allocation request detected: 1398022191 items! Connection closed.
</span>
這個問題害得我同事通宵到第二天10點了還沒搞定,網上查了很多資料都是說設定JVM引數,我們同事還一起修改flume-en.sh裡的
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
#JAVA_OPTS="-Xms1024m -Xmx2048m -Dcom.sun.management.jmxremote" 這個引數,可是怎麼改都不管用。都是沒有明顯的說出原因,還說修改什麼limits.conf檔案,什麼解釋都有很亂。
第二天我來上班接著排查:
解決辦法:
<1>ps -aux|grep flume 檢視剛才啟動的flume程序,發現如下資訊
<span style="font-size:18px;">/usr/jdk/bin/java -Xmx20m -Dflume.root.logger=INFO -cp conf</span>
<2>於是看一個flume-ng的python指令碼,發現<span style="font-size:18px;">JAVA_OPTS="-Xmx20m"</span>
問題就是出在這裡了,然後我把flume-ng腳本里的這個值調大後,一切執行正常了。<span style="font-size:18px;">JAVA_OPTS="-Xmx2048m"</span>
6.自己寫的plugin不能直接放到plugin.d目錄下,而是要自己子目錄,我當時直接把自己寫的jar放到plugin.d目錄下還是找不到,所以又再建子目錄。
<span style="font-size:18px;">[[email protected] apache-flume-1.4.0-bin]# ls
bin CHANGELOG conf DEVNOTES docs lib LICENSE NOTICE plugins.d README RELEASE-NOTES tools
[[email protected] apache-flume-1.4.0-bin]# cd plugins.d/panguoyuan/lib/
[[email protected] lib]# ls
flume-ng-ext.jar</span>
7.採集端的sink的type=avro,彙總端的source的type如果為netcat的話會報如下錯誤
解決辦法:把source的type改為avro
<span style="font-size:18px;">2015-03-11 11:40:44,144 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:382)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: fk01, port: 44444 }: Failed to send batch
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:294)
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:366)
... 3 more
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: fk01, port: 44444 }: Handshake timed out after 20000ms
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:338)
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:282)
... 4 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:336)
... 5 more
2015-03-11 11:40:49,147 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:205)] Rpc sink avro_sink: Building RpcClient with hostname: 10.58.22.219, port: 44444</span>
8、flume輸出到hdfs之前需要對指定目錄授權:hadoop dfs -chmod -R 777 /