Flume "java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.setWriteToWAL"
阿新 • • 發佈:2018-12-27
之前我們的架構方式採用的是spark+hbase+oozie解析儲存及呼叫演算法模式,最近突然出現一個需求,會有很多小檔案上傳,而且要求達到偽實時處理,也就是秒級別,spark很顯然不適合解析了,哪怕是幾十行的檔案, spark也基本是分鐘級別。
我想過2個方案來處理,一個是使用純JAVA來解析檔案,另外一個就是使用flume來解析並直接儲存到HBASE。
下載最新版本Flume1.8,通過spoolDir方式,配置檔案如下:
然後啟動flume:a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /data/flume/r1/data a1.sources.r1.batchSize = 100 a1.sources.r1.channels = c1 a1.channels.c1.type=file a1.channels.c1.write-timeout=10 a1.channels.c1.keep-alive=10 a1.channels.c1.checkpointDir=/data/flume/c1/checkpoint a1.channels.c1.dataDirs=/data/flume/c1/data a1.channels.c1.maxFileSize= 268435456 #a1.sinks.k1.type = logger a1.sinks.k1.type = hbase a1.sinks.k1.table = flume a1.sinks.k1.columnFamily = cf #a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.k1.batchSize = 100 a1.sinks.k1.serializer.regex = (.*?)\\|\\|(.*?)\\|\\|(.*?)\\|\\|(.*?)\\|\\|(.*) a1.sinks.k1.serializer.colNames = ROW_KEY,cnc_rdspmeter[0],cnc_rdsvmeter,cnc_statinfo[3],ext_toolno a1.sinks.k1.serializer.regexIgnoreCase = true a1.sinks.k1.serializer.depositHeaders = true a1.sinks.hbaseSink.zookeeperQuorum = datanode01-ucloud.isesol.com:2181 a1.sinks.k1.channel = c1
bin/flume-ng agent -n a1 -c conf -f conf/flume-conf.properties
在消費檔案的時候錯誤如下:
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)Lorg/apache/hadoop/hbase/client/Put; at org.apache.flume.sink.hbase.HBaseSink$3.run(HBaseSink.java:380) at org.apache.flume.sink.hbase.HBaseSink$3.run(HBaseSink.java:375) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:375) at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:345) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) at java.lang.Thread.run(Thread.java:748) ^CAttempting to shutdown background worker.
setWriteWal在之前版本存在,但是1.0之後應該就沒有了,我不知道為什麼Flume的開發者在最新的1.8仍然在使用這個方法,很無奈,查詢了一下網上,基本沒什麼解決方案,於是開啟原始碼,看看究竟怎麼回事。
因為我使用的是type是hbase,因此找到hbaseSink.java, 通過find查詢哪裡有setWriteWAL, 發現有3個地方存在,
public Void run() throws Exception { for (Row r : actions) { if (r instanceof Put) { // ((Put) r).setWriteToWAL(enableWal); } // Newer versions of HBase - Increment implements Row. if (r instanceof Increment) { // ((Increment) r).setWriteToWAL(enableWal); } } table.batch(actions); return null; }
public Void run() throws Exception {
List<Increment> processedIncrements;
if (batchIncrements) {
processedIncrements = coalesceIncrements(incs);
} else {
processedIncrements = incs;
}
// Only used for unit testing.
if (debugIncrCallback != null) {
debugIncrCallback.onAfterCoalesce(processedIncrements);
}
for (final Increment i : processedIncrements) {
// i.setWriteToWAL(enableWal);
table.increment(i);
}
return null;
}
});
上面3個被我注視掉的地方,就是setWriteWAL, 這個東西實際無所謂,因此我很暴力的直接註釋,然後再重新打一個包進行替換,官方名字叫:flume-ng-hbase-sink-1.8.0.jar。重新啟動Flume,檢視結果:
hbase(main):001:0> scan 'flume'
ROW COLUMN+CELL
1529992556110-SzjikLv1LH-0 column=cf:ROW_KEY, timestamp=1529992556407, value=cnc_exeprgname:418
1529992556110-SzjikLv1LH-0 column=cf:cnc_rdspmeter[0], timestamp=1529992556407, value=cnc_rdspmeter[0]:0
1529992556110-SzjikLv1LH-0 column=cf:cnc_rdsvmeter, timestamp=1529992556407, value=cnc_rdsvmeter:6,7,92,0
1529992556110-SzjikLv1LH-0 column=cf:cnc_statinfo[3], timestamp=1529992556407, value=cnc_statinfo[3]:3
1529992556110-SzjikLv1LH-0 column=cf:ext_toolno, timestamp=1529992556407, value=ext_toolno:30
1529992556125-SzjikLv1LH-1 column=cf:ROW_KEY, timestamp=1529992556407, value=cnc_exeprgname:418
1529992556125-SzjikLv1LH-1 column=cf:cnc_rdspmeter[0], timestamp=1529992556407, value=cnc_rdspmeter[0]:0
1529992556125-SzjikLv1LH-1 column=cf:cnc_rdsvmeter, timestamp=1529992556407, value=cnc_rdsvmeter:6,7,93,0
1529992556125-SzjikLv1LH-1 column=cf:cnc_statinfo[3], timestamp=1529992556407, value=cnc_statinfo[3]:3
1529992556125-SzjikLv1LH-1 column=cf:ext_toolno, timestamp=1529992556407, value=ext_toolno:30
1529992556126-SzjikLv1LH-2 column=cf:ROW_KEY, timestamp=1529992556407, value=cnc_exeprgname:418
1529992556126-SzjikLv1LH-2 column=cf:cnc_rdspmeter[0], timestamp=1529992556407, value=cnc_rdspmeter[0]:0
1529992556126-SzjikLv1LH-2 column=cf:cnc_rdsvmeter, timestamp=1529992556407, value=cnc_rdsvmeter:5,10,93,0
1529992556126-SzjikLv1LH-2 column=cf:cnc_statinfo[3], timestamp=1529992556407, value=cnc_statinfo[3]:3
1529992556126-SzjikLv1LH-2 column=cf:ext_toolno, timestamp=1529992556407, value=ext_toolno:30
1529992556127-SzjikLv1LH-3 column=cf:ROW_KEY, timestamp=1529992556407, value=cnc_exeprgname:418
1529992556127-SzjikLv1LH-3 column=cf:cnc_rdspmeter[0], timestamp=1529992556407, value=cnc_rdspmeter[0]:0
1529992556127-SzjikLv1LH-3 column=cf:cnc_rdsvmeter, timestamp=1529992556407, value=cnc_rdsvmeter:7,8,93,0
1529992556127-SzjikLv1LH-3 column=cf:cnc_statinfo[3], timestamp=1529992556407, value=cnc_statinfo[3]:3
1529992556127-SzjikLv1LH-3 column=cf:ext_toolno, timestamp=1529992556407, value=ext_toolno:30
1529992556128-SzjikLv1LH-4 column=cf:ROW_KEY, timestamp=1529992556407, value=cnc_exeprgname:418
1529992556128-SzjikLv1LH-4 column=cf:cnc_rdspmeter[0], timestamp=1529992556407, value=cnc_rdspmeter[0]:0
1529992556128-SzjikLv1LH-4 column=cf:cnc_rdsvmeter, timestamp=1529992556407, value=cnc_r
世界終於清靜了。 這個ROWKEY的設定不符合我的需求,還需要修改原始碼。