Flume資訊採集配置

阿新 • • 發佈：2019-01-15

flume的一些核心概念:
Agent：使用JVM 執行Flume。每臺機器執行一個agent，但是可以在一個agent中包含多個sources和sinks。
Client：生產資料，執行在一個獨立的執行緒。
Source：從Client收集資料，傳遞給Channel。
Sink：從Channel收集資料，執行在一個獨立執行緒。
Channel：連線 sources 和 sinks ，這個有點像一個佇列。
Events：可以是日誌記錄、 avro 物件等。

這裡寫圖片描述

flume的安裝：
1)將下載的flume包，解壓到指定目錄中，你就已經完成了50%；
2)修改 flume-env.sh 配置檔案,主要是JAVA_HOME變數設定；

        JAVA_HOME=/usr/local/jdk1.8.0_91/

flume的案例：
1)案例1：監聽指定日誌檔案,併發送到指定地址
配置檔案：

a1.sources = r1
a1.sinks = k1 
a1.channels = c1

# Describe/configure the source  
a1.sources.r1.type = exec  
a1.sources.r1.command = tail -F /data/logs/test/test.log

# Use a channel which buffers events in memory  
a1.channels 
.c1.type = memory  
a1.channels.c1.keep-alive = 10  
a1.channels.c1.capacity = 100000  
a1.channels.c1.transactionCapacity = 10000

# Bind the source and sink to the channel  
a1.sinks.k1.type = avro  
a1.sinks.k1.hostname = 192.168.1.11
a1.sinks.k1.port = 41420

a1.sinks.k1.channel = c1 
a1.sources.r1.channels 
 = c1

2)案例2：監聽指定日誌資料夾,傳輸新增檔案
配置檔案：

# Name the components on this agent
a1.sources = r1 r2 
a1.sinks = k1 k2
a1.channels = c1 c2 

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /data/logs/backup_plat/adcallback
a1.sources.r1.fileHeader = true

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = type
a1.sources.r1.interceptors.i1.value = test

# Describe/configure the source
a1.sources.r2.type = spooldir
a1.sources.r2.channels = c2
a1.sources.r2.spoolDir = /data/logs/backup_plat/burypoint
a1.sources.r2.fileHeader = true

a1.sources.r2.interceptors = i2
a1.sources.r2.interceptors.i2.type = static
a1.sources.r2.interceptors.i2.key = type
a1.sources.r2.interceptors.i2.value = test


# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.11
a1.sinks.k1.port = 41410

# Describe the sink
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.1.11
a1.sinks.k2.port = 41411

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Use a channel which buffers events in memory
a1.channels.c2.type = memory
a1.channels.c2.capacity = 100000
a1.channels.c2.transactionCapacity = 10000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

# Bind the source and sink to the channel
a1.sources.r2.channels = c2
a1.sinks.k2.channel = c2

3)案例3：接收日誌,並傳輸到kafka
配置檔案：

a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source 
#hongyou/guessyoulike
a3.sources.r1.type = avro
a3.sources.r1.bind = 0.0.0.0
a3.sources.r1.port = 41420

## Source 攔截器
#hongyou/guessyoulike
a3.sources.r1.interceptors = i1
a3.sources.r1.interceptors.i1.type = static
a3.sources.r1.interceptors.i1.key = topic
a3.sources.r1.interceptors.i1.preserveExisting = false
a3.sources.r1.interceptors.i1.value = hongyou_guessyoulike_topic

#具體定義sink
#hongyou/guessyoulike
#a3.sinks.k1.type = org.apache.flume.plugins.KafkaSink
a3.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
#a3.sinks.k1.metadata.broker.list=node1:9092,node2:9092,node3:9092
a3.sinks.k1.kafka.bootstrap.servers=node1:9092,node2:9092,node3:9092
a3.sinks.k1.sink.directory = /home/hadoop/app/apache-flume-1.7.0-bin/logs/
a3.sinks.k1.partitioner.class=org.apache.flume.plugins.SinglePartition
a3.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a3.sinks.k1.request.required.acks=0
a3.sinks.k1.max.message.size=1000000
a3.sinks.k1.producer.type=async
a3.sinks.k1.encoding=UTF-8
#a3.sinks.k1.topic.name=hongyou_guessyoulike_topic


# Use a channel which buffers events in memory

a3.channels.c1.type = memory
a3.channels.c1.capacity = 10000
a3.channels.c1.transactionCapacity = 1000

# Bind the source and sink to the channel

a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

4)案例4：接收日誌,並傳輸到hdfs
配置檔案：

a1.sources = r1 r2 r3 r4 r5 
a1.sinks = k1 k2 k3 k4 k5
a1.channels = c1 c2 c3 c4 c5

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 41410

a1.sources.r2.type = avro
a1.sources.r2.bind = 0.0.0.0
a1.sources.r2.port = 41411

# source r1定義攔截器，為訊息新增時間戳
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.sources.r2.interceptors = i2
a1.sources.r2.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

#具體定義sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://ns1/logs/plat/adcallback
a1.sinks.k1.hdfs.filePrefix = adcallback_%Y%m%d
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollSize = 67108864
a1.sinks.k1.hdfs.rollInterval = 0

#具體定義sink
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = hdfs://ns1/logs/plat/burypoint
a1.sinks.k2.hdfs.filePrefix = burypoint_%Y%m%d
a1.sinks.k2.hdfs.fileSuffix = .log
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k2.hdfs.rollSize = 67108864
a1.sinks.k2.hdfs.rollInterval = 0

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 1000000

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000000
a1.channels.c2.transactionCapacity = 100000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

# Bind the source and sink to the channel
a1.sources.r2.channels = c2
a1.sinks.k2.channel = c2

Flume資訊採集配置

flume的一些核心概念: Agent：使用JVM 執行Flume。每臺機器執行一個agent，但是可以在一個agent中包含多個sources和sinks。 Client：生產資料，執行在一個獨立的執行緒。 Source：

flume 1.6 安裝及配置日誌採集配置

1.下載flume1.6 2.安裝jdk和Hadoop 具體參照以前wen'文章 3.flume 配置檔案修改修改conf目錄下的flume-env.sh檔案 export JA

利用flume增量採集關係資料庫的配置

網上關於flume採集關係資料庫如mysql的資料比較多，但是大部分都是複製貼上，一錯皆錯，而且對於配置引數的介紹不太完整，或者未說明引數意義，由於最近在使用flume，所以在這裡對配置引數簡單總結一下。這裡只介紹sql-source的配置 agent.channels=ch1agent.

Flume資料採集各種配置詳解

Flume簡介 Flume是Cloudera提供的一個高可用的，高可靠的，分散式的海量日誌採集、聚合和傳輸的系統，Flume支援在日誌系統中定製各類資料傳送方，用於收集資料；同時，Flume提供對資料進行簡單處理，並寫到各種資料接受方（可定製）的能力。系統功能

Flume資料採集結合etcd作為配置中心在爬蟲資料採集處理中的架構實踐。

Apache Flume是一個分散式的、可靠的、可用的系統,用於有效地收集、聚合和將大量日誌資料從許多不同的源移動到一個集中的資料儲存，但是其本身是以本地properties作為配置的，配置無法做到動態監聽和更新。一、Flume和ETCD的結合，使用ETCD作為flume 資料採集的配置中心。

flume 多chanel配置

ica ransac eve oot replica sink gty rep use #配置文 a1.sources= r1 a1.sinks= k1 k2 a1.channels= c1 c2 #Describe/configure the sou

FLUME單節點配置並自定義攔截器

3. Flume1.7.0解壓縮和更換目錄 # cd /opt # tar -xzvf apache-flume-1.7.0-bin.tar.gz # mv apache-flume-1.7.0-bin flume1.7.0 # chmod 777 -R /opt/f

移動互聯時代的移動端證件識別OCR，資訊採集新幫手

隨著網際網路時代的發展，很多APP都需要個人身份證資訊的輸入認證（即實名認證），在移動端證件識別研發出來之前需要手動去輸入身份證號碼和姓名，速度非常慢，且使用者體驗非常差。為了提高在移動終端上輸入身份證資訊的速度和準確性，我們開發出離線身份證ocr識別SDK開發包，以滿足各行業將證件識別功能嵌入到APP中，通

大資料技術學習筆記之網站流量日誌分析專案：Flume日誌採集系統1

一、網站日誌流量專案 -》專案開發階段： -》可行性分析 -》需求分析

開源：Swagger Butler 1.1.0釋出，利用ZuulRoute資訊簡化配置內容

Swagger Butler是一個基於Swagger與Zuul構建的API文件彙集工具。通過構建一個簡單的Spring Boot應用，增加一些配置就能將現有整合了Swagger的Web應用的API文件都彙總到一起，方便檢視與測試。快速入門該工具的時候非常簡單，先通過下

Python 運維自動化之伺服器資訊採集

主要是採集伺服器的CPU，記憶體，硬碟，網路等資訊。用到的主要模組psutil還有subprocess，要注意管道問題（subprocess.popen函式的引數注意使用）。上程式碼 1 def test2(): 2 fnull = open(os.devnull, 'w')

京東商城雙十一光棍節商品資訊採集教程

本文主要介紹“京東商品資訊採集爬蟲”（以下簡稱“京東爬蟲”）的使用教程及注意事項。一年一度的光棍節就要到了，這個雙十一準備好趁手的採集工具沒呀？雖然市面上的採集工具種類繁多，但能夠採集京東商品的工具確實不多，而且對於像京東這類國內主流電商平臺，又該如何通過採集工具收集競品店鋪的商品價格

新浪微博資訊採集釋出教程

本文主要介紹“新浪微博採集爬蟲”（以下簡稱“微博爬蟲”）的使用教程以及注意事項。新浪微博中有大量高價值的軟文資料，應用價值很高，接下來，給你詳細說明用“微博爬蟲”採集並匯出資料的步驟：步驟1 設定爬蟲進入“微博爬蟲”總覽頁，點選“應用設定”，您可以選擇“檔案託管”服務託管圖

如何快速開發人人貸散標資訊採集爬蟲呢？

本文主要介紹“人人貸散標資訊採集爬蟲”（以下簡稱“人人貸散標爬蟲”）的使用教程及注意事項。採集網址： https://www.renrendai.com/loan.html 使用功能點： · 從單個頁面採集多條資料 · initCrawl、onProcessContentP

Flume各種採集日誌方式與輸出目錄

1、從網路埠採集資料輸出到控制檯一個簡單的socket 到 console配置 # 定義這個agent中各元件的名字 a1.sources = r1 a1.sinks = k1 a1.

射頻資訊採集高速資料採集卡射頻採集記錄

2018年8月,西安慕雷電子釋出了全球頂級高速資料採集記錄儲存系統，取樣率高達4GSPS，解析度12bit，模擬頻寬2GHZ,記錄儲存頻寬高達6GB/S！西安慕雷電子供應全球頂級高速資料採集卡及超寬頻高速採集記錄儲存系統。作為頂尖的高速資料採集卡生產商及系統研發

超寬頻訊號高速採集記錄回放系統特點高速採集卡射頻資訊採集

超寬頻訊號高速採集記錄回放系統特點：超寬頻訊號採集、記錄、儲存與回放，用於實驗資料事後分析及外場環境重建。長時間連續採集分析記錄，為電子對抗、偵察及情報監聽提供決策依據。監測分析複雜電磁環境訊號，對實驗或真實場景進行分析評估。提供訊號模擬回放與軟體產生功

多通道高速採集卡高速採集卡高速採集記錄射頻資訊採集

多通道寬頻訊號高速資料採集記錄儲存系統基於高效能PCI EXPRESS及SRIO協議，實現標準化、模組化、可擴充套件、可重構的高速資料採集記錄儲存處理平臺。採用高效能的ADC、DAC和超大容量固態FLASH及高速海量磁碟陣列儲存，廣泛適用於軍用、民用領域的機

【圖文詳細】Flume 資料採集元件——實戰案例

5、Flume 實戰案例 5.1、安裝部署 Flume 1、Flume 的安裝非常簡單，只需要解壓即可，當然，前提是已有 Hadoop 環境上傳安裝包到資料來源所在節點上然後解壓 tar -zxvf apache-flume-1.8.0-bin.

【圖文詳細】Flume 資料採集元件—— 體系結構/核心元件

4、Flume 體系結構/核心元件 4.1、概述 Flume 的資料流由事件(Event)貫穿始終。事件是 Flume 的基本資料單位，它攜帶日誌資料(字節陣列形式)並且攜帶有頭資訊，這些 Event 由 Agent 外部的 Source 生成，當

Flume資訊採集配置

相關推薦