node1

192.168.1.11

node2

192.168.1.12

node3

192.168.1.13

備註

NameNode

Hadoop

Y

Y

高可用

DateNode

 Y

Y

Y

ResourceManager

 Y

Y

高可用
NodeManager

Y

Y

Y

JournalNodes 

Y

Y

Y

奇數個,至少3個節點
ZKFC(DFSZKFailoverController

Y

Y

namenode的地方就有ZKFC

QuorumPeerMain

Zookeeper

Y

Y

Y

MySQL

HIVE

 Y

Hive元資料庫

MetastoreRunJar

 Y

HIVERunJar

Y

HMasterHBaseYY高可用
HRegionServerYYY

Spark(Master)

Spark

Y

 Y

高可用

Spark(Worker)

 Y

Y

Y

以前搭建過一套,帶Federation,至少需4臺機器,過於複雜,筆記本也吃不消。現為了學習Spark2.0版本,決定去掉Federation,簡化學習環境,不過還是完全分散式
所有軟體包:
apache-ant-1.9.9-bin.tar.gz
apache-hive-1.2.1-bin.tar.gz
apache-maven-3.3.9-bin.tar.gz
apache-tomcat-6.0.44.tar.gz
CentOS-6.9-x86_64-minimal.iso
findbugs-3.0.1.tar.gz
hadoop-2.7.3-src.tar.gz
hadoop-2.7.3.tar.gz
hadoop-2.7.3(自已編譯的centOS6.9版本).tar.gz
hbase-1.3.1-bin(自己編譯).tar.gz
hbase-1.3.1-src.tar.gz
jdk-8u121-linux-x64.tar.gz
mysql-connector-java-5.6-bin.jar
protobuf-2.5.0.tar.gz
scala-2.11.11.tgz
snappy-1.1.3.tar.gz
spark-2.1.1-bin-hadoop2.7.tgz

關閉防火牆

[[email protected] ~]# service iptables stop 
[[email protected] ~]# chkconfig iptables off

zookeeper

[[email protected] ~]# tar -zxvf /root/zookeeper-3.4.9.tar.gz -C /root
[[email protected] ~]# cp /root/zookeeper-3.4.9/conf/zoo_sample.cfg /root/zookeeper-3.4.9/conf/zoo.cfg
[[email protected] ~]# vi /root/zookeeper-3.4.9/conf/zoo.cfg
[[email protected] ~]# vi /root/zookeeper-3.4.9/bin/zkEnv.sh 
[[email protected] ~]# mkdir /root/zookeeper-3.4.9/logs
[[email protected] ~]# vi /root/zookeeper-3.4.9/conf/log4j.properties 
[[email protected] ~]# mkdir /root/zookeeper-3.4.9/zkData
[[email protected] ~]# scp -r /root/zookeeper-3.4.9 node2:/root
[[email protected] ~]# scp -r /root/zookeeper-3.4.9 node3:/root
[[email protected] ~]# touch /root/zookeeper-3.4.9/zkData/myid
[[email protected] ~]# echo 1 > /root/zookeeper-3.4.9/zkData/myid
[[email protected] ~]# touch /root/zookeeper-3.4.9/zkData/myid
[[email protected] ~]# echo 2 > /root/zookeeper-3.4.9/zkData/myid
[[email protected] ~]# touch /root/zookeeper-3.4.9/zkData/myid
[[email protected] ~]# echo 3 > /root/zookeeper-3.4.9/zkData/myid

環境變數

[[email protected] ~]# vi /etc/profile
export JAVA_HOME=/root/jdk1.8.0_121
export SCALA_HOME=/root/scala-2.11.11
export HADOOP_HOME=/root/hadoop-2.7.3
export HIVE_HOME=/root/apache-hive-1.2.1-bin
export HBASE_HOME=/root/hbase-1.3.1
export SPARK_HOME=/root/spark-2.1.1-bin-hadoop2.7
export PATH=.:$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/root:$HIVE_HOME/bin:$HBASE_HOME/bin:$SPARK_HOME
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
[[email protected] ~]# source /etc/profile
[[email protected] ~]# scp /etc/profile node2:/etc
[[email protected] ~]# source /etc/profile
[[email protected]~]# scp /etc/profile node3:/etc
[[email protected] ~]# source /etc/profile

Hadoop

[[email protected] ~]# tar -zxvf /root/hadoop-2.7.3.tar.gz -C /root
[[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
 
[[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
<property>
   <name>dfs.replication</name>
   <value>2</value>
</property>
<property>
   <name>dfs.blocksize</name>
   <value>64m</value>
</property>
<property>
   <name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>
<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>node1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>node2:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>node1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>node2:50070</value>
</property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop-2.7.3/tmp/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.mycluster</name>
<value>true</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/root/.ssh/id_rsa</value>
</property>
[[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop-2.7.3/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
[[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/slaves
node1
node2
node3
[[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/yarn-env.sh
 [[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/mapred-site.xml
<configuration>
<property> 
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.max-age-ms</name>
<value>6048000000</value>
</property>
</configuration>
[[email protected] ~]# vi /root/hadoop-2.7.3/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>                                                                
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>   
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
[[email protected] ~]# mkdir -p /root/hadoop-2.7.3/tmp/journal
[[email protected] ~]# mkdir -p /root/hadoop-2.7.3/tmp/journal
[[email protected] ~]# mkdir -p /root/hadoop-2.7.3/tmp/journal
將編譯的本地包中的native庫替換/root/hadoop-2.7.3/lib/native
[[email protected] ~]# scp -r /root/hadoop-2.7.3/ node2:/root 
[[email protected] ~]# scp -r /root/hadoop-2.7.3/ node3:/root 
檢視自己的Hadoop是32位還是64位
[[email protected] native]# file libhadoop.so.1.0.0 
libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
[[email protected] native]# pwd
/root/hadoop-2.7.3/lib/native

啟動ZK

[[email protected] ~]#/root/zookeeper-3.4.9/bin/zkServer.sh start
[[email protected] ~]#/root/zookeeper-3.4.9/bin/zkServer.sh start
[[email protected] ~]#/root/zookeeper-3.4.9/bin/zkServer.sh start

格式化zkfc

[[email protected] ~]# /root/hadoop-2.7.3/bin/hdfs zkfc -formatZK