1. 程式人生 > >大資料處理框架: Flume + Redis4.0.11 叢集

大資料處理框架: Flume + Redis4.0.11 叢集

上一篇文章關於Storm kafka Zookeeper 叢集、本次加入Flume Redis 的叢集

Apache Flume是一個分散式,可靠且可用的系統,用於高效地收集,彙總和將來自多個不同源的大量日誌資料移動到集中式資料儲存。
Apache Flume的使用不僅限於日誌資料聚合。由於資料來源是可定製的,因此Flume可用於傳輸大量事件資料,包括但不限於網路流量資料,社交媒體生成的資料,電子郵件訊息以及幾乎任何可能的資料來源。

一、安裝配置:

(1)前期準備:kafka+zookeeper+Storm 叢集環境以安裝
(2)Flume : apache-flume-1.8.0-bin.tar.gz 可以到官網下載: wget

http://mirror.bit.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz (一定要匹配 flume1.8.0,使用jdk 1.8或更高版本)可以在 http://mirror.bit.edu.cn/apache/flume/ 裡面找到你需要版本號下載。
(3)Redis:redis-4.0.11.tar.gz wget http://download.redis.io/releases/redis-4.0.11.tar.gzhttp://download.redis.io/releases/ 在這裡面找你需要的版本號)
(6)進行解壓 配置環境變數 vi /ect/profile

# JAVA_HOME
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export ZOOKEEPER_HOME=/usr/local/java/zookeeper-3.4.13
export PATH=$PATH:$ZOOKEEPER_HOME/bin/:$JAVA_HOME/bin
#KAFKA_HOME
export KAFKA_HOME=/usr/local/java/kafka_2.11-2.0.0
export
PATH=$PATH:$KAFKA_HOME/bin # STORM_HOME export STORM_HOME=/usr/local/java/apache-storm-1.2.2 export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:${STORM_HOME}/bin:$PATH #FLUME_HOME export FLUME_HOME=/usr/local/java/flume/apache-flume-1.8.0-bin export path=$PATH:FLUME_HOME/bin

環境變數需要重啟生效 source /ect/profile

1、需要配置flume-env
[[email protected] conf]# pwd
/usr/local/java/flume/apache-flume-1.8.0-bin/conf
[[email protected] conf]# cp -r flume-env.sh.template flume-env.sh
[[email protected] conf]# vi flume-env.sh
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
# during Flume startup.

# Enviroment variables can be set here.

# export JAVA_HOME=/usr/lib/jvm/java-8-oracle  JAVA_HOME 目錄
 export JAVA_HOME=/usr/local/java/jdk1.8.0_191
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

# Let Flume write raw event data and configuration information to its log files for debugging
# purposes. Enabling these flags is not recommended in production,
# as it may result in logging sensitive user information or encryption secrets.
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
# Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""
2、配置flum到其他兩臺虛擬機器
[[email protected] flume]# scp -r apache-flume-1.8.0-bin [email protected]192.168.164.134:/usr/local/java/flume/
[email protected]192.168.164.134's password: 

[[email protected] flume]# scp -r apache-flume-1.8.0-bin [email protected]192.168.164.135:/usr/local/java/flume/
[email protected]192.168.164.134's password: 
3、配置環境變數 source /etc/profile 重啟
[[email protected] flume]# vi /etc/profile
# java JDK
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export ZOOKEEPER_HOME=/usr/local/java/zookeeper-3.4.13
export PATH=$PATH:$ZOOKEEPER_HOME/bin/:$JAVA_HOME/bin
#KAFKA_HOME
export KAFKA_HOME=/usr/local/java/kafka_2.11-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin
# strom_home
export STORM_HOME=/usr/local/java/apache-storm-1.2.2
export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:${STORM_HOME}/bin:$PATH

#FLUME_HOME
export FLUME_HOME=/usr/local/java/flume/apache-flume-1.8.0-bin
export path=$PATH:FLUME_HOME/bin

二、Redis叢集正常工作至少需要3個主節點(3個主節點、3個從節點)請注意Redis叢集是從3.0以後開始支援的。

(1 ) 從redis官網https://redis.io/下載redis版本redis-4.0.11。解壓redis tar -zxvf redis-4.0.11.tar.gz
(2)進入 redis-4.0.11

#在虛擬機器 192.168.164.133 
[[email protected] redis]# cd redis-4.0.11
#進入解壓後的redis-4.0.11/src目錄、執行make install 命令
[[email protected] src]# pwd
/usr/local/java/redis/redis-4.0.11/src
[[email protected] src]# make install
    CC Makefile.dep
Hint: It's a good idea to run 'make test' ;)

    INSTALL install
    INSTALL install
    INSTALL install
    INSTALL install
    INSTALL install

#建立redis節點目錄
[[email protected] redis-cluster]# mkdir -p /usr/local/java/redis/redis-4.0.11/redis-cluster/7000
[[email protected] redis-cluster]# mkdir -p /usr/local/java/redis/redis-4.0.11/redis-cluster/7001

[[email protected] redis-cluster]# cp /usr/local/java/redis/redis-4.0.11/redis.conf /usr/local/java/redis/redis-4.0.11/redis-cluster/7000
[[email protected] redis-cluster]# cp /usr/local/java/redis/redis-4.0.11/redis.conf /usr/local/java/redis/redis-4.0.11/redis-cluster/7001

(3)分別進入 redis-cluster/7000 、redis-cluster/7000 下修改redis.conf 配置項,

不能設定密碼,否則叢集啟動時會連線不上

[[email protected] redis-4.0.11]# cd redis-cluster/7000/
[[email protected] 7000]# vi redis.conf 
################################## NETWORK #####################################

# By default, if no "bind" configuration directive is specified, Redis listens
# for connections from all the network interfaces available on the server.
# It is possible to listen to just one or multiple selected interfaces using
# the "bind" configuration directive, followed by one or more IP addresses.
#
# Examples:
#
# bind 192.168.1.100 10.0.0.1
# bind 127.0.0.1 ::1
#
# ~~~ WARNING ~~~ If the computer running Redis is directly exposed to the
# internet, binding to all the interfaces is dangerous and will expose the
# instance to everybody on the internet. So by default we uncomment the
# following bind directive, that will force Redis to listen only into
# the IPv4 lookback interface address (this means Redis will be able to
# accept connections only from clients running into the same computer it
# is running).
#
# IF YOU ARE SURE YOU WANT YOUR INSTANCE TO LISTEN TO ALL THE INTERFACES
# JUST COMMENT THE FOLLOWING LINE.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bind 127.0.0.1    //***************根據本機所在的IP或hostname去配製
# Protected mode is a layer of security protection, in order to avoid that
# Redis instances left open on the internet are accessed and exploited.
#
# When protected mode is on and if:
#
# 1) The server is not binding explicitly to a set of addresses using the
#    "bind" directive.
# 2) No password is configured.
#
# The server only accepts connections from clients connecting from the
# IPv4 and IPv6 loopback addresses 127.0.0.1 and ::1, and from Unix domain
# sockets.
#
# By default protected mode is enabled. You should disable it only if
# you are sure you want clients from other hosts to connect to Redis
# even if no authentication is configured, nor a specific set of interfaces
# are explicitly listed using the "bind" directive.
protected-mode yes

# Accept connections on the specified port, default is 6379 (IANA #815344).
# If port 0 is specified Redis will not listen on a TCP socket.
port 7000   //埠根據對應的資料夾去配製埠 7000,7001,7002,7003,7004,7005 


################################# GENERAL #####################################

# By default Redis does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis.pid when daemonized.
daemonize yes

# If you run Redis from upstart or systemd, Redis can interact with your
# supervision tree. Options:
#   supervised no      - no supervision interaction
#   supervised upstart - signal upstart by putting Redis into SIGSTOP mode
#   supervised systemd - signal systemd by writing READY=1 to $NOTIFY_SOCKET
#   supervised auto    - detect upstart or systemd method based on
#                        UPSTART_JOB or NOTIFY_SOCKET environment variables
# Note: these supervision methods only signal "process is ready."
#       They do not enable continuous liveness pings back to your supervisor.
supervised no

# If a pid file is specified, Redis writes it where specified at startup
# and removes it at exit.
#
# When the server runs non daemonized, no pid file is created if none is
# specified in the configuration. When the server is daemonized, the pid file
# is used even if not specified, defaulting to "/var/run/redis.pid".
#
# Creating a pid file is best effort: if Redis is not able to create it
# nothing bad happens, the server will start and run normally.
pidfile /var/run/redis_7000.pid   //pidfile檔案對應7000,7001,7002,7003,7004,7005

# Specify the server verbosity level.
# This can be one of:
# debug (a lot of information, useful for development/testing)
# verbose (many rarely useful info, but not a mess like the debug level)
# notice (moderately verbose, what you want in production probably)
# warning (only very important / critical messages are logged)
loglevel notice

# Specify the log file name. Also the empty string can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile ""

# To enable logging to the system logger, just set 'syslog-enabled' to yes,
# and optionally update the other syslog parameters to suit your needs.
# syslog-enabled no

# Specify the syslog identity.



################################ REDIS CLUSTER  ###############################
#
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# WARNING EXPERIMENTAL: Redis Cluster is considered to be stable code, however
# in order to mark it as "mature" we need to wait for a non trivial percentage
# of users to deploy it in production.
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#
# Normal Redis instances can't be part of a Redis Cluster; only nodes that are
# started as cluster nodes can. In order to start a Redis instance as a
# cluster node enable the cluster support uncommenting the following:
#
 cluster-enabled yes   //開啟叢集  把註釋#去掉

# Every cluster node has a cluster configuration file. This file is not
# intended to be edited by hand. It is created and updated by Redis nodes.
# Every Redis Cluster node requires a different cluster configuration file.
# Make sure that instances running in the same system do not have
# overlapping cluster configuration file names.
#
 cluster-config-file nodes-7000.conf  //叢集的配置  配置檔案首次啟動自動生成 7000,7001,7002,7003,7004,7005

# Cluster node timeout is the amount of milliseconds a node must be unreachable
# for it to be considered in failure state.
# Most other internal time limits are multiple of the node timeout.
#
 cluster-node-timeout 15000      //請求超時  預設15秒,可自行設定

# A slave of a failing master will avoid to start a failover if its data
# looks too old.
#
# There is no simple way for a slave to actually have an exact measure of
# its "data age", so the following two checks are performed:
#
# 1) If there are multiple slaves able to failover, they exchange messages
#    in order to try to give an advantage to the slave with the best
#    replication offset (more data from the master processed).
#    Slaves will try to get their rank by offset, and apply to the start
#    of the failover a delay proportional to their rank.
#
# 2) Every single slave computes the time of the last interaction with
#    its master. This can be the last ping or command received (if the master
#    is still in the "connected" state), or the time that elapsed since the
#    disconnection with the master (if the replication link is currently down).
#    If the last interaction is too old, the slave will not try to failover
#    at all.
#
# The point "2" can be tuned by user. Specifically a slave will not perform
# the failover if, since the last interaction with the master, the time
# elapsed is greater than:

############################## APPEND ONLY MODE ###############################

# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check http://redis.io/topics/persistence for more information.

appendonly yes    //aof日誌開啟  有需要就開啟,它會每次寫操作都記錄一條日誌

# The name of the append only file (default: "appendonly.aof")

appendfilename "appendonly.aof"

# The fsync() call tells the Operating System to actually write data on disk
# instead of waiting for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log. Slow, Safest.
# everysec: fsync only one time every second. Compromise.
#
# The default is "everysec", as that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# More details please check the following article:
# http://antirez.com/post/redis-persistence-demystified.html
#
# If unsure, use "everysec".

# appendfsync always

(4)配置項

port  7000                                //埠根據對應的資料夾去配製埠 7000,7001,7002,7003,7004,7005      
bind 本機ip                               //根據本機所在的IP或hostname去配製
daemonize    yes                          //redis後臺執行
pidfile  /var/run/redis_7000.pid          //pidfile檔案對應7000,7001,7002,7003,7004,7005
cluster-enabled  yes                      //開啟叢集
cluster-config-file  nodes_7000.conf      //叢集的配置  配置檔案首次啟動自動生成 7000,7001,7002,7003,7004,7005
cluster-node-timeout  15000               //請求超時  預設15秒
appendonly  yes                           //aof日誌開啟  有需要就開啟,它會每次寫操作都記錄一條日誌

(5)把當前redis-4.0.11 資料夾拷貝到其它兩臺伺服器,分別修改 7002、7003、7004、7005

# 拷貝到其它兩臺伺服器上,輸入密碼即可
[[email protected] redis]# scp -r redis-4.0.11 [email protected]192.168.164.134:/usr/local/java/redis/
[email protected]192.168.164.134's password: 
#-------------------------------
[[email protected] redis]# scp -r redis-4.0.11 [email protected]192.168.164.134:/usr/local/java/redis/
[email protected]192.168.164.134's password: 

(6)進入 cd redis-cluster/ 分別修改 7002、7003、7004、7005

[[email protected] redis-cluster]# mv 7000 7002
[[email protected] redis-cluster]# mv 7001 7003
[[email protected] redis-cluster]# ls
7002  7003

(7)進入 7002、7003 修改相對應 redis.conf
(8)另一臺虛擬機器請按照上面修改。

#redis進行原始碼安裝,先要安裝gcc,再make redis。執行以下命令安裝redis:
yum -y install gcc gcc