1. 程式人生 > >大資料協作框架之Flume

大資料協作框架之Flume

一、概述

Flume是Cloudera提供的一個高可用的,高可靠的,分散式的海量日誌採集、聚合和傳輸的系統,Flume支援在日誌系統中定製各類資料傳送方,用於收集資料;同時,Flume提供對資料進行簡單處理,並寫到各種資料接受方(可定製)的能力。

                                     

二、安裝

 1、解壓:

tar -zxvf flume-ng-1.6.0-cdh5.14.2.tar.gz -C /opt/cdh5.14.2/

2、檢視conf目錄:

[[email protected] conf]# ll
total 16
-rw-r--r-- 1 1106 4001 1661 Mar 28 04:47 flume-conf.properties.template
-rw-r--r-- 1 1106 4001 1455 Mar 28 04:47 flume-env.ps1.template
-rw-r--r-- 1 1106 4001 1565 Mar 28 04:47 flume-env.sh.template
-rw-r--r-- 1 1106 4001 3107 Mar 28 04:47 log4j.properties

3、重新命名flume-env.sh.template為flume-env.sh

4、配置flume-env.sh

檢視Java_home的路徑:

[[email protected] conf]# echo $JAVA_HOME
/opt/java/jdk1.7.0_80

修改flume-env.sh:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
# during Flume startup.

# Enviroment variables can be set here.

export JAVA_HOME=/opt/java/jdk1.7.0_80

# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

# Let Flume write raw event data and configuration information to its log files for debugging
# purposes. Enabling these flags is not recommended in production,
# as it may result in logging sensitive user information or encryption secrets.
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "

# Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""

5、將Hadoop的lib目錄下的四個jar包拷貝到flume的lib目錄:

commons-configuration-1.6.jar,hadoop-auth-2.6.0-cdh5.14.2.jar,hadoop-common-2.6.0-cdh5.14.2.jar,hadoop-hdfs-2.6.0-cdh5.14.2.jar

[[email protected] hadoop-2.6.0]# cp -r share/hadoop/tools/lib/commons-configuration-1.6.jar /opt/cdh5.14.2/flume-1.6.0/lib/
[[email protected] hadoop-2.6.0]# cp -r share/hadoop/common/lib/hadoop-auth-2.6.0-cdh5.14.2.jar /opt/cdh5.14.2/flume-1.6.0/lib/
[[email protected] hadoop-2.6.0]# cp -r share/hadoop/common/hadoop-common-2.6.0-cdh5.14.2.jar /opt/cdh5.14.2/flume-1.6.0/lib/
[[email protected] hadoop-2.6.0]# cp -r share/hadoop/hdfs/hadoop-hdfs-2.6.0-cdh5.14.2.jar /opt/cdh5.14.2/flume-1.6.0/lib/

三、基本使用

1、檢視當前flume版本:

[[email protected] flume-1.6.0]# bin/flume-ng version
Flume 1.6.0-cdh5.14.2
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 50436774fa1c7eaf0bd9c89ac6ee845695fbb687
Compiled by jenkins on Tue Mar 27 13:55:10 PDT 2018
From source with checksum 30217fe2b34097676ff5eabb51f4a11d

2、檢視幫助:

[[email protected] flume-1.6.0]# bin/flume-ng help
Usage: bin/flume-ng <command> [options]...

commands:
  help                      display this help text
  agent                     run a Flume agent
  avro-client               run an avro Flume client
  version                   show Flume version info

global options:
  --conf,-c <conf>          use configs in <conf> directory
  --classpath,-C <cp>       append to the classpath
  --dryrun,-d               do not actually start Flume, just print the command
  --plugins-path <dirs>     colon-separated list of plugins.d directories. See the
                            plugins.d section in the user guide for more details.
                            Default: $FLUME_HOME/plugins.d
  -Dproperty=value          sets a Java system property value
  -Xproperty=value          sets a Java -X option

agent options:
  --name,-n <name>          the name of this agent (required)
  --conf-file,-f <file>     specify a config file (required if -z missing)
  --zkConnString,-z <str>   specify the ZooKeeper connection to use (required if -f missing)
  --zkBasePath,-p <path>    specify the base path in ZooKeeper for agent configs
  --no-reload-conf          do not reload config file if changed
  --help,-h                 display help text

avro-client options:
  --rpcProps,-P <file>   RPC client properties file with server connection params
  --host,-H <host>       hostname to which events will be sent
  --port,-p <port>       port of the avro source
  --dirname <dir>        directory to stream to avro source
  --filename,-F <file>   text file to stream to avro source (default: std input)
  --headerFile,-R <file> File containing event headers as key/value pairs on each new line
  --help,-h              display help text

  Either --rpcProps or both --host and --port must be specified.

Note that if <conf> directory is specified, then it is always included first
in the classpath.

 

四、Flume Agent應用例子:將hive的log儲存到hdfs

1、在flume的conf目錄下建立檔案catchhivelogs.conf,檔案的內容的格式可以參照conf目錄下的flume-conf.properties.template

2、編寫catchhivelogs.conf

http://flume.apache.org/FlumeUserGuide.html#exec-source

http://flume.apache.org/FlumeUserGuide.html#memory-channel

http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

#define agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1

#define sources
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /opt/cdh5.14.2/hive-1.1.0/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

#define channels
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#define sinks
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://master.cdh.com:8020/user/flume/hive-logs/
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 10

#bind the sources and sinks to the channels
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 

3、執行:

bin/flume-ng agent -c conf -n a1 -f conf/catchhivelogs.conf -Dflume.root.logger=DEBUG,console

 發現報錯了:

2018-07-31 17:43:21,039 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder
	at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2803)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:186)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:260)
	at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:252)
	at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
	at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
	at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.Tracer$Builder
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 18 more

解決辦法:

將hadoop-2.6.0/share/hadoop/common/lib目錄下的htrace-core4-4.0.1-incubating.jar拷貝到flume的lib目錄下。

4、執行結果:

檢視hdfs:

5、優化:

 在第2步中的conf:

 

a1.sinks.k1.hdfs.path = hdfs://master.cdh.com:8020/user/flume/hive-logs/

當Hadoop 為HA時候,這種寫法肯定有問題。

解決辦法:1、將hdfs-site.xml和core-site.xml複製到flume-1.6.0/conf的目錄下

[[email protected] hadoop]# cp -r hdfs-site.xml /opt/cdh5.14.2/flume-1.6.0/conf/
[[email protected] hadoop]# cp -r core-site.xml /opt/cdh5.14.2/flume-1.6.0/conf/

2、修改a1.sinks.k1.hdfs.path = hdfs://ns1/user/flume/hive-logs/

 

五、flume在企業大資料中的架構: