大資料協作框架之flume詳解

阿新 • • 發佈：2019-01-26

flume的安裝配置
   1、下載
   2、加壓
       $tar zxf /sourcepath/ -C /copypath
   3、配置flumu-env.sh檔案
       exprt JAVA_HOME=/jdkpath
   4、啟動
       $bin/flume-ng help/version

flume的使用
   常用命令
       一般使用的命令
       $bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/flume-telnet.conf -Dflume.root.logger=INFO,console

       **-c或--conf 後面跟配置目錄
       **-f或—-conf-file 後面跟具體的配置檔案
       **-n或—-name 指定Agent的名稱

       案例1、
       使用flume監控某個埠，把埠寫入的資料輸出為logger
       配置檔名稱：flum-telnet.conf
       ==================agent a1=======================
       # Name the components on this agent
       #定義一個source
       a1.sources = r1
       #定義一個sinks
       a1.sinks = k1
       #定義一個channel
       a1.channels = c1

       # Describe/configure the source
       #指定source型別
       a1.sources.r1.type = netcat
       #指定監控主機ip
       a1.sources.r1.bind = 192.168.242.128
       #指定監控主機埠
       a1.sources.r1.port = 44444

       # Describe the sink
       #sinks通過logger輸出
       a1.sinks.k1.type = logger

       # Use a channel which buffers events in memory
       #設定channel的型別：memory
       a1.channels.c1.type = memory
       #設定channel中evens最大有個數
       a1.channels.c1.capacity = 1000
       #設定sink從channel獲取的個數
       a1.channels.c1.transactionCapacity = 100

       # Bind the source and sink to the channel
       #把source和channl建立連結
       a1.sources.r1.channels = c1
       #把channel和sink建立連結
       a1.sinks.k1.channel = c1

   準備工作：
       telnet：是基於tcp協議的一個登陸訪問遠端機器的服務
       yum -y install telnet

       檢視埠是否佔用
       netstat -an|grep 44444

       telnet使用埠
       telnet 192.168.242.128 44444
   啟動flum的agent
       $bin/flume-ng
       **flume的agent
       agent
       **flumed配置檔案目錄
       --conf conf/
       **配置檔案的agent的名稱
       --name a1
       **flume中agent的具體配置檔案
       --conf-file conf/flume-telnet.conf
       **在控制檯輸出info級別的日誌
       -Dflume.root.logger=INFO,console
   案例2、
   ** 企業常用
   ** 日誌檔案 --> 新新增[追加]
   使用flume去監控某個檔案,將新新增進檔案的內容抽取到其他地方[HDFS]
   =======================agent（flume-apache.conf）=========================
   # Name the components on this agent
   a2.sources = r2
   a2.channels = c2
   a2.sinks = k2

   # define sources
   #設定source為命令型別（exec）
   a2.sources.r2.type = exec
   #執行的命令
   a2.sources.r2.command = tail -F /var/log/httpd/access_log
   #執行方式（shell）
   a2.sources.r2.shell = /bin/bash -c

   # define channels
   #設定channel的快取型別為memory
   a2.channels.c2.type = memory
   a2.channels.c2.capacity = 1000
   a2.channels.c2.transactionCapacity = 100

   # define sinks
   #設定sink為寫入目標為hdfs
   a2.sinks.k2.type = hdfs
   #設定寫入的ip，並且定義檔案為日期格式的二級目錄結構
   a2.sinks.k2.hdfs.path=hdfs://192.168.242.128:8020/flume/%Y%m%d/%H%M
   #定義存放在hdfs上的檔名稱字首
   a2.sinks.k2.hdfs.filePrefix = accesslog
   #啟用日期檔案命名格式
   a2.sinks.k2.hdfs.round=true
   #設定建立檔案目錄結構的時間間隔以及單位
   a2.sinks.k2.hdfs.roundValue=5
   a2.sinks.k2.hdfs.roundUnit=minute
   #設定使用本地時間戳
   a2.sinks.k2.hdfs.useLocalTimeStamp=true

   #設定批處理的大小
   a2.sinks.k2.hdfs.batchSize=1000
   #設定檔案型別
   a2.sinks.k2.hdfs.fileType=DataStream
   #設定檔案格式
   a2.sinks.k2.hdfs.writeFormat=Text

   # bind the sources and sinks to the channels
   a2.sources.r2.channels = c2
   a2.sinks.k2.channel = c2

   準備工作
       安裝apache，並且啟動服務
       $su - root
       $yum -y install httpd
       $service httpd start
       在apache中建立可以訪問的html頁面
       $vi /var/www/html/index.html

       檢視日誌
       $tail -f /var/log/httpd/access_log


   [

[email protected] ~]$ tail -f /var/log/httpd/access_log
   tail: 無法開啟"/var/log/httpd/access_log" 讀取資料: 許可權不夠

   一般使用者可以讀取/var/log/httpd下檔案
       $su - root
       $chmod 755 /var/log/httpd
   啟動
       bin/flume-ng agent --conf conf/ --name a2 --conf-file conf/flume-apache.conf -Dflume.root.logger=INFO,console

   java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
       at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:251)
       at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
       at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
       at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
       at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
       ... 12 more

   flume往目標hdfs上寫檔案，flume相當於hadoop hdfs的一個客戶端，也就需要在flumelib下匯入hadoop jar，如下四個：
       hadoop-hdfs-2.5.0-cdh5.3.6.jar
       hadoop-common-2.5.0-cdh5.3.6.jar
       hadoop-auth-2.5.0-cdh5.3.6.jar
       commons-configuration-1.6.jar

   解決flume檔案過多過小的問題
   #設定解決檔案過多過小問題
   #每600秒生成一個檔案
   a2.sinks.k2.hdfs.rollInterval=600
   #當達到128000000bytes時，建立新檔案 127*1024*1024
   #實際環境中如果按照128M回顧檔案,那麼這裡設定一般設定成127M
   a2.sinks.k2.hdfs.rollSize=128000000
   #設定檔案的生成不和events數相關
   a2.sinks.k2.hdfs.rollCount=0
   #設定成1，否則當有副本複製時就重新生成檔案，上面三條則沒有效果
   a2.sinks.k2.hdfs.minBlockReplicas=1

   案例3：
   利用flume監控某個目錄[/var/log/httpd],把裡面回滾好的檔案
   實時抽取到HDFS平臺。

   # Name the components on this agent
   a3.sources = r3
   a3.channels = c3
   a3.sinks = k3

   # define sources
   #設定監控的型別，檔案目錄
   a3.sources.r3.type = spooldir
   #設定目錄
   a3.sources.r3.spoolDir = /home/beifeng/logs
   #設定忽略目錄中的檔案
   a3.sources.r3.ignorePattern = ^.*\_log$

   # define channels
   #設定channel快取資料的型別（file）
   a3.channels.c3.type = file
   #設定檢測點目錄
   a3.channels.c3.checkpointDir = /opt/modules/apache-flume-1.5.0-cdh5.3.6-bin/checkpoint
   #檔案快取位置
   a3.channels.c3.dataDirs = /opt/modules/apache-flume-1.5.0-cdh5.3.6-bin/checkdata

   # define sinks
   a3.sinks.k3.type = hdfs
   a3.sinks.k3.hdfs.path=hdfs://192.168.17.129:8020/flume2/%Y%m%d/%H
   a3.sinks.k3.hdfs.filePrefix = accesslog
   a3.sinks.k3.hdfs.round=true
   a3.sinks.k3.hdfs.roundValue=1
   a3.sinks.k3.hdfs.roundUnit=hour
   a3.sinks.k3.hdfs.useLocalTimeStamp=true

   a3.sinks.k3.hdfs.batchSize=1000
   a3.sinks.k3.hdfs.fileType=DataStream
   a3.sinks.k3.hdfs.writeFormat=Text

   #設定解決檔案過多過小問題
   #每600秒生成一個檔案
   a3.sinks.k3.hdfs.rollInterval=600
   #當達到128000000bytes時，建立新檔案 127*1024*1024
   #實際環境中如果按照128M回顧檔案,那麼這裡設定一般設定成127M
   a3.sinks.k3.hdfs.rollSize=128000000
   #設定檔案的生成不和events數相關
   a3.sinks.k3.hdfs.rollCount=0
   #設定成1，否則當有副本複製時就重新生成檔案，上面三條則沒有效果
   a3.sinks.k3.hdfs.minBlockReplicas=1

   # bind the sources and sinks to the channels
   a3.sources.r3.channels = c3
   a3.sinks.k3.channel = c3

大資料協作框架之flume詳解

大資料協作框架之flume詳解

大資料協作框架之Flume

大資料協作框架之Oozie

大資料協作框架之Sqoop

hadoop大資料平臺架構之DKhadoop詳解

Django框架之 querySet詳解

web前端開之網站搭建框架之vue詳解

大資料處理框架之:Storm + Kafka + zookeeper 叢集

空間資料視覺化之ArcLayer詳解

大資料學習筆記之flume----日誌收集系統

spark大資料架構初學入門基礎詳解

帶你看懂大資料採集引擎之Flume&採集目錄中的日誌

大資料求索(15): Redis事務詳解

大資料協作框架——sqoop學習權威指南

【大資料演算法】:apriori演算法詳解，非常清晰

Spring MVC學習總結（14）——SpringMVC測試框架之mockMVC詳解

許可權框架之Shiro詳解

大資料開發-Spark Join原理詳解

大資料之hdfs詳解之三：put許可權剖析與常用命令

大資料環境搭建之Kafka完全分散式環境搭建步驟詳解

大資料協作框架之flume詳解

相關推薦