1. 程式人生 > >Hadoop 2.6 叢集方式搭建

Hadoop 2.6 叢集方式搭建

Ubuntu 14.04下安裝 JDK8過程筆記,希望對大家有幫助。
在oracle 官網下載最新穩定的版本,我的是jdk8


tar zvxf jdk-8u20-linux-x64.tar.gz
sudo mkdir -p /usr/local/jdk/
sudo mv jdk1.8.0_40 /usr/local/jdk/




修改/etc/profile
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_40/
export JRE_HOME=/usr/local/jdk/jdk1.8.0_40/jre


export HADOOP_HOME=/home/work/hadoop/default
export SCALA_HOME=/home/work/scala/default
export SPARK_HOME=/home/work/spark/default


export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin


source /etc/profile




驗證jdk是否安裝成功
[email protected]
:~$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)


=====================
hadoop 安裝
1.先保證機器上安裝ssh和rsync,否則需要先安裝ssh和rsync,以保證後面的安裝
  $ sudo apt-get install ssh 
  $ sudo apt-get install rsync
2.根據下面的結構修改叢集的機器的hosts和hostname
  sudo vim /etc/hosts
  sudo vim /etc/hostname
  |-------------------------------------------------
  |虛擬機器系統      |機器名稱  |IP地址         |
  |Ubuntu 12.04.3 LTS  |hadoop-mater   |172.16.101.24  |
  |Ubuntu 12.04.3 LTS  |hadoop-slave1  |172.16.101.22  |
  |Ubuntu 12.04.3 LTS  |hadoop-slave2  |172.16.101.15  |
3.生成公鑰讓三臺機器之間免密碼SSH等錄
  在三臺機器上執行 :ssh-keygen -t rsa,然後回車,將生成後的id_rsa.pub
  的內容追加到另外兩臺機器~/.ssh/authorized_keys中
  
4.wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
 下載最新的hadoop 2.6


5.解壓tar zxvf hadoop-2.6.0.tar.gz,同時為了方便,我們建立一個軟鏈指向解壓目錄
  ln -s hadoop-2.6.0 hadoop




6.修改hadoop的配置,讓他們按照叢集方式執行。
  要配置Hadoop叢集,你需要設定Hadoop守護程序的執行環境和Hadoop守護程序的執行引數。
  Hadoop守護程序指NameNode/DataNode 和JobTracker/TaskTracker。
  
  hadoop-env.sh :
    cd hadoop/ ; vim etc/hadoop/hadoop-env.sh
  
    export JAVA_HOME=/usr/local/jdk/jdk1.8.0_40/
    export HADOOP_PREFIX=/home/work/hadoop/
sudo mkdir -p /home/work/hadoop/pid_file/
export HADOOP_PID_DIR=/home/work/hadoop/pid_file/
  core-site.xml :
sudo mkdir -p /data/hadoop
chown work:work -R data
vim core-site.xml
===============================================================================
<configuration>
   <property>
           <name>hadoop.tmp.dir</name>
           <value>/data/hadoop/</value>
           <description>A base for other temporary directories.</description>
       </property>
       <!-- file system properties -->
       <property>
           <name>fs.default.name</name>
           <value>hdfs://hadoop2-master:9000</value>
       </property>
       <property>
           <name>hadoop.security.authorization</name>
           <value>false</value>
           <description>
               Enable authorization for different protocols.
           </description>
       </property>
   
       <!-- add like old config by ricky -->
       <property>
       <name>mapred.child.java.opts</name>
       <value>-Xmx2048m</value>
       </property>
   
       <property>
       <name>hadoop.security.authentication </name>
       <value>simple</value>
       </property>
   
       <property>
           <name>fs.trash.interval</name>
           <value>3000</value>
           <description>Number of minutes after which the checkpoint gets deleted.
                 If zero, the trash feature is disabled.
           </description>
        </property>
    </configuration>
    ==================================================================================
 hdfs-site.xml :
==================================================================================
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/dfs/name</value>
<final>true</final>
</property>


<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/dfs/data</value>
<final>true</final>
</property>


<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>1</value>
</property>
-->
<property>
<name>dfs.datanode.max.xcievers</name>
<value>65536</value>
</property>


<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>


<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>


<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>


<property>
<name>dfs.http.address</name>
<value>hadoop2-master:50070</value>
</property>


<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2-slave1:50090</value>
</property>
</configuration>
    ==================================================================================
 mapred-site.xml :
==================================================================================
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>768</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>640</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx384m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx512m</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx200m</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop2-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop2-master:19888</value>
</property>


<!-- add from old config -->
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>16</value>
</property>
<property>
<name>tasktracker.http.threads</name>
<value>64</value>
</property>
<property>
<name>mapred.job.shuffle.input.buffer.percent</name>
<value>0.7</value>
</property>
<property>
<name>mapred.job.shuffle.merge.percent</name>
<value>0.7</value>
</property>
<property>
<name>io.sort.mb</name>
<value>64</value>
</property>
<property>
<name>io.sort.factor</name>
<value>16</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
</configuration>
==================================================================================
 yarn-site.xml :
==================================================================================
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>


<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>


<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop2-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop2-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop2-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop2-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop2-master:50030</value>
</property>


<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>


<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>


<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>


<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
==================================================================================

7.系統試用
1)啟動並格式化NameNode :bin/hdfs namenode -format
2)啟動HDFS :sbin/start-dfs.sh
3)停止HDFS :sbin/stop-dfs.sh
4)啟動yarn :sbin/start-yarn.sh 
    5)停止yarn : sbin/stop-yarn.sh
6)檢視hdfs叢集狀態:bin/hdfs dfsadmin -report
 檢視hdfs http://172.16.101.24:50070
      檢視RM(Resource Manager): http://172.16.101.24:50030




=====================
spark  安裝
wget http://mirrors.cnnic.cn/apache/spark/spark-1.3.0/spark-1.3.0.tgz