上一篇文章關於Storm kafka Zookeeper 叢集、本次加入Flume Redis 的叢集

Apache Flume是一個分散式，可靠且可用的系統，用於高效地收集，彙總和將來自多個不同源的大量日誌資料移動到集中式資料儲存。
Apache Flume的使用不僅限於日誌資料聚合。由於資料來源是可定製的，因此Flume可用於傳輸大量事件資料，包括但不限於網路流量資料，社交媒體生成的資料，電子郵件訊息以及幾乎任何可能的資料來源。

一、安裝配置：

（1）前期準備：kafka+zookeeper+Storm 叢集環境以安裝
（2）Flume : apache-flume-1.8.0-bin.tar.gz 可以到官網下載: wget

http://mirror.bit.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz (一定要匹配 flume1.8.0，使用jdk 1.8或更高版本）可以在 http://mirror.bit.edu.cn/apache/flume/ 裡面找到你需要版本號下載。
（3）Redis：redis-4.0.11.tar.gz wget http://download.redis.io/releases/redis-4.0.11.tar.gz （http://download.redis.io/releases/ 在這裡面找你需要的版本號）
（6）進行解壓配置環境變數 vi /ect/profile

# JAVA_HOME
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export ZOOKEEPER_HOME=/usr/local/java/zookeeper-3.4.13
export PATH=$PATH:$ZOOKEEPER_HOME/bin/:$JAVA_HOME/bin
#KAFKA_HOME
export KAFKA_HOME=/usr/local/java/kafka_2.11-2.0.0
export 
 PATH=$PATH:$KAFKA_HOME/bin
# STORM_HOME
export STORM_HOME=/usr/local/java/apache-storm-1.2.2
export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:${STORM_HOME}/bin:$PATH

#FLUME_HOME
export FLUME_HOME=/usr/local/java/flume/apache-flume-1.8.0-bin
export path=$PATH:FLUME_HOME/bin

環境變數需要重啟生效 source /ect/profile

1、需要配置flume-env

[[email protected] conf]# pwd
/usr/local/java/flume/apache-flume-1.8.0-bin/conf
[[email protected] conf]# cp -r flume-env.sh.template flume-env.sh
[[email protected] conf]# vi flume-env.sh
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
# during Flume startup.

# Enviroment variables can be set here.

# export JAVA_HOME=/usr/lib/jvm/java-8-oracle  JAVA_HOME 目錄
 export JAVA_HOME=/usr/local/java/jdk1.8.0_191
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

# Let Flume write raw event data and configuration information to its log files for debugging
# purposes. Enabling these flags is not recommended in production,
# as it may result in logging sensitive user information or encryption secrets.
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
# Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""

2、配置flum到其他兩臺虛擬機器

[[email protected] flume]# scp -r apache-flume-1.8.0-bin [email protected]192.168.164.134:/usr/local/java/flume/
[email protected]192.168.164.134's password: 

[[email protected] flume]# scp -r apache-flume-1.8.0-bin [email protected]192.168.164.135:/usr/local/java/flume/
[email protected]192.168.164.134's password:

3、配置環境變數 source /etc/profile 重啟

[[email protected] flume]# vi /etc/profile
# java JDK
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export ZOOKEEPER_HOME=/usr/local/java/zookeeper-3.4.13
export PATH=$PATH:$ZOOKEEPER_HOME/bin/:$JAVA_HOME/bin
#KAFKA_HOME
export KAFKA_HOME=/usr/local/java/kafka_2.11-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin
# strom_home
export STORM_HOME=/usr/local/java/apache-storm-1.2.2
export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:${STORM_HOME}/bin:$PATH

#FLUME_HOME
export FLUME_HOME=/usr/local/java/flume/apache-flume-1.8.0-bin
export path=$PATH:FLUME_HOME/bin

二、Redis叢集正常工作至少需要3個主節點（3個主節點、3個從節點）請注意Redis叢集是從3.0以後開始支援的。

（1 ）從redis官網https://redis.io/下載redis版本redis-4.0.11。解壓redis tar -zxvf redis-4.0.11.tar.gz
（2）進入 redis-4.0.11

#在虛擬機器 192.168.164.133 
[[email protected] redis]# cd redis-4.0.11
#進入解壓後的redis-4.0.11/src目錄、執行make install 命令
[[email protected] src]# pwd
/usr/local/java/redis/redis-4.0.11/src
[[email protected] src]# make install
    CC Makefile.dep
Hint: It's a good idea to run 'make test' ;)

    INSTALL install
    INSTALL install
    INSTALL install
    INSTALL install
    INSTALL install

#建立redis節點目錄
[[email protected] redis-cluster]# mkdir -p /usr/local/java/redis/redis-4.0.11/redis-cluster/7000
[[email protected] redis-cluster]# mkdir -p /usr/local/java/redis/redis-4.0.11/redis-cluster/7001

[[email protected] redis-cluster]# cp /usr/local/java/redis/redis-4.0.11/redis.conf /usr/local/java/redis/redis-4.0.11/redis-cluster/7000
[[email protected] redis-cluster]# cp /usr/local/java/redis/redis-4.0.11/redis.conf /usr/local/java/redis/redis-4.0.11/redis-cluster/7001

（3）分別進入 redis-cluster/7000 、redis-cluster/7000 下修改redis.conf 配置項，

不能設定密碼，否則叢集啟動時會連線不上

[[email protected] redis-4.0.11]# cd redis-cluster/7000/
[[email protected] 7000]# vi redis.conf 
################################## NETWORK #####################################

# By default, if no "bind" configuration directive is specified, Redis listens
# for connections from all the network interfaces available on the server.
# It is possible to listen to just one or multiple selected interfaces using
# the "bind" configuration directive, followed by one or more IP addresses.
#
# Examples:
#
# bind 192.168.1.100 10.0.0.1
# bind 127.0.0.1 ::1
#
# ~~~ WARNING ~~~ If the computer running Redis is directly exposed to the
# internet, binding to all the interfaces is dangerous and will expose the
# instance to everybody on the internet. So by default we uncomment the
# following bind directive, that will force Redis to listen only into
# the IPv4 lookback interface address (this means Redis will be able to
# accept connections only from clients running into the same computer it
# is running).
#
# IF YOU ARE SURE YOU WANT YOUR INSTANCE TO LISTEN TO ALL THE INTERFACES
# JUST COMMENT THE FOLLOWING LINE.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bind 127.0.0.1    //***************根據本機所在的IP或hostname去配製
# Protected mode is a layer of security protection, in order to avoid that
# Redis instances left open on the internet are accessed and exploited.
#
# When protected mode is on and if:
#
# 1) The server is not binding explicitly to a set of addresses using the
#    "bind" directive.
# 2) No password is configured.
#
# The server only accepts connections from clients connecting from the
# IPv4 and IPv6 loopback addresses 127.0.0.1 and ::1, and from Unix domain
# sockets.
#
# By default protected mode is enabled. You should disable it only if
# you are sure you want clients from other hosts to connect to Redis
# even if no authentication is configured, nor a specific set of interfaces
# are explicitly listed using the "bind" directive.
protected-mode yes

# Accept connections on the specified port, default is 6379 (IANA #815344).
# If port 0 is specified Redis will not listen on a TCP socket.
port 7000   //埠根據對應的資料夾去配製埠 7000,7001,7002,7003,7004,7005 


################################# GENERAL #####################################

# By default Redis does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis.pid when daemonized.
daemonize yes

# If you run Redis from upstart or systemd, Redis can interact with your
# supervision tree. Options:
#   supervised no      - no supervision interaction
#   supervised upstart - signal upstart by putting Redis into SIGSTOP mode
#   supervised systemd - signal systemd by writing READY=1 to $NOTIFY_SOCKET
#   supervised auto    - detect upstart or systemd method based on
#                        UPSTART_JOB or NOTIFY_SOCKET environment variables
# Note: these supervision methods only signal "process is ready."
#       They do not enable continuous liveness pings back to your supervisor.
supervised no

# If a pid file is specified, Redis writes it where specified at startup
# and removes it at exit.
#
# When the server runs non daemonized, no pid file is created if none is
# specified in the configuration. When the server is daemonized, the pid file
# is used even if not specified, defaulting to "/var/run/redis.pid".
#
# Creating a pid file is best effort: if Redis is not able to create it
# nothing bad happens, the server will start and run normally.
pidfile /var/run/redis_7000.pid   //pidfile檔案對應7000,7001,7002,7003,7004,7005

# Specify the server verbosity level.
# This can be one of:
# debug (a lot of information, useful for development/testing)
# verbose (many rarely useful info, but not a mess like the debug level)
# notice (moderately verbose, what you want in production probably)
# warning (only very important / critical messages are logged)
loglevel notice

# Specify the log file name. Also the empty string can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile ""

# To enable logging to the system logger, just set 'syslog-enabled' to yes,
# and optionally update the other syslog parameters to suit your needs.
# syslog-enabled no

# Specify the syslog identity.



################################ REDIS CLUSTER  ###############################
#
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# WARNING EXPERIMENTAL: Redis Cluster is considered to be stable code, however
# in order to mark it as "mature" we need to wait for a non trivial percentage
# of users to deploy it in production.
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#
# Normal Redis instances can't be part of a Redis Cluster; only nodes that are
# started as cluster nodes can. In order to start a Redis instance as a
# cluster node enable the cluster support uncommenting the following:
#
 cluster-enabled yes   //開啟叢集  把註釋#去掉

# Every cluster node has a cluster configuration file. This file is not
# intended to be edited by hand. It is created and updated by Redis nodes.
# Every Redis Cluster node requires a different cluster configuration file.
# Make sure that instances running in the same system do not have
# overlapping cluster configuration file names.
#
 cluster-config-file nodes-7000.conf  //叢集的配置  配置檔案首次啟動自動生成 7000,7001,7002,7003,7004,7005

# Cluster node timeout is the amount of milliseconds a node must be unreachable
# for it to be considered in failure state.
# Most other internal time limits are multiple of the node timeout.
#
 cluster-node-timeout 15000      //請求超時  預設15秒，可自行設定

# A slave of a failing master will avoid to start a failover if its data
# looks too old.
#
# There is no simple way for a slave to actually have an exact measure of
# its "data age", so the following two checks are performed:
#
# 1) If there are multiple slaves able to failover, they exchange messages
#    in order to try to give an advantage to the slave with the best
#    replication offset (more data from the master processed).
#    Slaves will try to get their rank by offset, and apply to the start
#    of the failover a delay proportional to their rank.
#
# 2) Every single slave computes the time of the last interaction with
#    its master. This can be the last ping or command received (if the master
#    is still in the "connected" state), or the time that elapsed since the
#    disconnection with the master (if the replication link is currently down).
#    If the last interaction is too old, the slave will not try to failover
#    at all.
#
# The point "2" can be tuned by user. Specifically a slave will not perform
# the failover if, since the last interaction with the master, the time
# elapsed is greater than:

############################## APPEND ONLY MODE ###############################

# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check http://redis.io/topics/persistence for more information.

appendonly yes    //aof日誌開啟  有需要就開啟，它會每次寫操作都記錄一條日誌

# The name of the append only file (default: "appendonly.aof")

appendfilename "appendonly.aof"

# The fsync() call tells the Operating System to actually write data on disk
# instead of waiting for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log. Slow, Safest.
# everysec: fsync only one time every second. Compromise.
#
# The default is "everysec", as that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# More details please check the following article:
# http://antirez.com/post/redis-persistence-demystified.html
#
# If unsure, use "everysec".

# appendfsync always

（4）配置項

port  7000                                //埠根據對應的資料夾去配製埠 7000,7001,7002,7003,7004,7005      
bind 本機ip                               //根據本機所在的IP或hostname去配製
daemonize    yes                          //redis後臺執行
pidfile  /var/run/redis_7000.pid          //pidfile檔案對應7000,7001,7002,7003,7004,7005
cluster-enabled  yes                      //開啟叢集
cluster-config-file  nodes_7000.conf      //叢集的配置  配置檔案首次啟動自動生成 7000,7001,7002,7003,7004,7005
cluster-node-timeout  15000               //請求超時  預設15秒
appendonly  yes                           //aof日誌開啟  有需要就開啟，它會每次寫操作都記錄一條日誌

（5）把當前redis-4.0.11 資料夾拷貝到其它兩臺伺服器，分別修改 7002、7003、7004、7005

# 拷貝到其它兩臺伺服器上，輸入密碼即可
[[email protected] redis]# scp -r redis-4.0.11 [email protected]192.168.164.134:/usr/local/java/redis/
[email protected]192.168.164.134's password: 
#-------------------------------
[[email protected] redis]# scp -r redis-4.0.11 [email protected]192.168.164.134:/usr/local/java/redis/
[email protected]192.168.164.134's password:

（6）進入 cd redis-cluster/ 分別修改 7002、7003、7004、7005

[[email protected] redis-cluster]# mv 7000 7002
[[email protected] redis-cluster]# mv 7001 7003
[[email protected] redis-cluster]# ls
7002  7003

（7）進入 7002、7003 修改相對應 redis.conf
（8）另一臺虛擬機器請按照上面修改。

#redis進行原始碼安裝，先要安裝gcc，再make redis。執行以下命令安裝redis：
yum -y install gcc gcc 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    大資料處理框架: Flume + Redis4.0.11 叢集
       
  
  
 上一篇文章關於Storm kafka Zookeeper 叢集、本次加入Flume Redis 的叢集 
 Apache Flume是一個分散式，可靠且可用的系統，用於高效地收集，彙總和將來自多個不同源的大量日誌資料移動到集中式資料儲存。 Apache Flume的使用不僅限於日誌資料聚合。 

  
 

    

    
    大資料處理框架之:Storm + Kafka + zookeeper 叢集
       
  
  
 Storm kafka zookeeper 叢集 
 我們知道storm的作用主要是進行流式計算，對於源源不斷的均勻資料流流入處理是非常有效的，而現實生活中大部分場景並不是均勻的資料流，而是時而多時而少的資料流入，這種情況下顯然用批量處理是不合適的，如果使用storm做實時計算的話可能因為資 

  
 

    

    
    最主流的五個大資料處理框架的優勢對比
       
 
 
 我深入分析了五個大資料處理框架：Hadoop，Spark，Flink，Storm，Samaza 
 
  Hadoop 
 
 頂尖的框架之一，大資料的代名詞。Hadoop，MapReduce，以及其生態系統和相關的技術，比如Pig，Hive，Flume，HDFS等。Hadoop是第一個，在工業 

  
 

    

    
    大資料處理框架分類與選擇
      （一）大資料處理框架分類不論是系統中存在的歷史資料，還是持續不斷接入系統中的實時資料，只要資料是可訪問的，我們就可以對資料進行處理。按照對所處理的資料形式和得到結果的時效性分類，資料處理框架可以分為兩類：批處理系統流處理系統批處理是一種用來計算大規模資料集的方法。批處理的過程包括將任務分解為較小的任務，分別在 

  
 

    

    
    centos7多節點部署redis4.0.11叢集
      1、伺服器叢集伺服器 redis節點node-i(192.168.0.168) 7001,7002node-ii(192.168.0.169) 7003,7004node-iii(192.168.0.170) 7005,7006    三個節點都關閉防火牆： 
     

  
 

    

    
    0基礎搭建Hadoop大資料處理-程式設計
      
							
							
							Hadoop的程式設計可以是在Linux環境或Winows環境中，在此以Windows環境為示例，以Eclipse工具為主（也可以用IDEA）。網上也有很多開發的文章，在此也參考他們的內容只作簡單的介紹和要點總結。
Hadoop是一個強大的並行框架，它允許任務在 

  
 

    

    
    大資料協作框架之Flume
       
 
 
 一、概述 
 Flume是Cloudera提供的一個高可用的，高可靠的，分散式的海量日誌採集、聚合和傳輸的系統，Flume支援在日誌系統中定製各類資料傳送方，用於收集資料；同時，Flume提供對資料進行簡單處理，並寫到各種資料接受方（可定製）的能力。 
       

  
 

    

    
    流式大資料處理 （實時）的三種框架：Storm，Spark和Samza
      
                
摘要：許多分散式計算系統都可以實時或接近實時地處理大資料流。本文將對Storm、Spark和Samza等三種Apache框架分別進行簡單介紹，然後嘗試快速、高度概述其異同。
許多分散式計算系統都可以實時或接近實時地處理大資料流。本文將對三種Apache框架分別進行簡單介紹， 

  
 

    

    
    流式大資料處理的三種框架：Storm，Spark和Samza
      
                許多分散式計算系統都可以實時或接近實時地處理大資料流。本文將對三種Apache框架分別進行簡單介紹，然後嘗試快速、高度概述其異同。Apache Storm在Storm中，先要設計一個用於實時計算的圖狀結構，我們稱之為拓撲（topology）。這個拓撲將會被提交給叢集，由叢集中 

  
 

    

    
    [BigData]流式大資料處理的三種框架：Storm，Spark和Samza
      

許多分散式計算系統都可以實時或接近實時地處理大資料流。本文將對三種Apache框架分別進行簡單介紹，然後嘗試快速、高度概述其異同。

Apache Storm

在Storm中，先要設計一個用於實時計算的圖狀結構，我們稱之為拓撲（topology）。這個拓撲將會被提交給叢集，由叢集中的主控節點（maste 

  
 

    

    
    大資料協作框架之flume詳解
      flume的安裝配置
    1、下載
    2、加壓
        $tar zxf /sourcepath/ -C /copypath
    3、配置flumu-env.sh檔案
        exprt JAVA_HOME=/jdkpath
    4、啟動
        $bin/flume 

  
 

    

    
    流式大資料處理的三種框架：Storm，Spark和Flink
      
								
								            
						
                
storm、spark streaming、flink都是開源的分散式系統，具有低延遲、可擴充套件和容錯性諸多優點，允許你在執行資料流程式碼時，將任務分配到一系列具有容錯能力的計算機上並行執行,都提供 

  
 

    

    
    大資料處理過程只需這四步，讓你從0到1！
      
                                        
                                                大資料這幾年火得不要不要，如同“站在風口上的豬”，但很多人只是停留在耳聞的階段，並不知道大資料真正的用途或是實操在哪，這其中也包括 

  
 

    

    
    大資料生態之資料處理框架探索
      資料處理框架
資料處理是一個非常寬泛的概念,資料處理框架在資料架構中,主要是用於資料移動和分析這兩大功能當中.對於資料移動,有離線資料移動和實時資料移動,也可以叫做是批量資料移動和流式資料移動.而對於分析這一塊,有離線資料分析和實時資料分析,也可以稱作是批量資料分析和流式資料分析.離線和實時,批量和流式,針對 

  
 

    

    
    大資料計算框架
       
 
 https://cloud.tencent.com/developer/article/1030476 
 1. 前言 
 計算機的基本工作就是處理資料，包括磁碟檔案中的資料，通過網路傳輸的資料流或資料包，資料庫中的結構化資料等。隨著網際網路、物聯網等技術得到越來越廣泛的應用，資料規模不斷增加，TB 

  
 

    

    
    DKhadoop大資料處理平臺監控資料介紹
       
  
  
 標題：DKhadoop大資料處理平臺監控資料介紹 2018年國內大資料公司50強榜單排名已經公佈了出來，大快以黑馬之姿闖入50強，並摘得多項桂冠。Hanlp自然語言處理技術也榮膺了“2018中國資料星技術”獎。對這份榜單感興趣的可以找一下看看。本篇承接上一篇《DKM平臺監控引數說明》，繼續就 

  
 

    

    
    淺談大資料處理
       
 
 
 剛接觸大資料處理，將大資料處理的框架記錄下來，之後深入的研究。 
 大資料處理的必要性 
 目前網際網路中資料的數量正在飛速的增長，首先是G為單位，然後是T級別、P級別、E級別。資料雖然很多，但是我們往往只慣性我們感興趣的那一部分，因此我們需要對海量資料進行處理獲取有價值的資訊來為我們所用。比如 

  
 

    

    
    大資料處理神器map-reduce實現(僅python和shell版本)
       
 
 
  
 熟悉java的人直接可以使用java實現map-reduce過程，而像我這種不熟悉java的怎麼辦？為了讓非java程式設計師方便處理資料，我把使用python，shell實現streaming的過程，也即為map-reduce過程，整理如下： 
  
 1.如果資料不在hive裡面，而在 

  
 

    

    
    學習Hadoop大資料基礎框架
       
  
  
 什麼是大資料？進入本世紀以來，尤其是2010年之後，隨著網際網路特別是移動網際網路的發展，資料的增長呈爆炸趨勢，已經很難估計全世界的電子裝置中儲存的資料到底有多少，描述資料系統的資料量的計量單位從MB（1MB大約等於一百萬位元組）、GB（1024MB）、TB（1024GB），一直向上攀升，目 

  
 

    

    
    Hadoop Streaming 做大資料處理詳解
       
 
 
  
   
    
    -------------------------------------------------------------------------- 
    以下內容摘自寒小陽老師大資料課程內容 
    -----------------------------