1. 程式人生 > >在docker上部署Hadoop

在docker上部署Hadoop

一、構建docker映象

1、 mkdir hadoop
2<span style="font-family: Arial, Helvetica, sans-serif;">、將hadoop-2.6.2.tar.gz複製到hadoop檔案中</span>
3、vim Dockfile
FROM ubuntu
MAINTAINER Docker tianlei <[email protected]>
ADD ./hadoop-2.6.2.tar.gz /usr/local/</span>
執行命令生成映象:
docker build -t "ubuntu:base" .
執行映象生成容器:
docker run -d -it --name hadoop ubuntu:hadoop
進入到映象中進行操作:
docker exec -i -t hadoop /bin/bash
1、映象中安裝java
sodu apt-get update
sudo apt-get install openjdk-7-jre openjdk-7-jdk
更改環境變數
vim ~/.bashrc
加入此行:
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
source ~/.bashrc
2、映象中安裝Hadoop

由於Hadop已經解壓縮在/usr/local/中

vim ~/.bashrc
新增:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
生成
source ~/.bashrc
修改環境變數
cd /usr/local/hadoop/etc/hadoop/
vim hadoop-env.sh
修改
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
在hadoop

目錄下建立tmp、namenode、datanode

這裡建立了三個目錄,後續配置的時候會用到:

  1. tmp:作為Hadoop的臨時目錄
  2. namenode:作為NameNode的存放目錄
  3. datanode:作為DataNode的存放目錄
進入到/etc目錄下修改三個xml

1).core-site.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value>
            <description>A base for other temporary directories.</description>
    </property>

    <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>bin/start-all.sh
            <final>true</final>
            <description>The name of the default file system.  A URI whose
            scheme and authority determine the FileSystem implementation.  The
            uri's scheme determines the config property (fs.SCHEME.impl) naming
            the FileSystem implementation class.  The uri's authority is used to
            determine the host, port, etc. for a filesystem.</description>
    </property>
</configuration>

注意:

  • hadoop.tmp.dir配置項值即為此前命令中建立的臨時目錄路徑。
  • fs.default.name配置為hdfs://master:9000,指向的是一個Master節點的主機(後續我們做叢集配置的時候,自然會配置這個節點,先寫在這裡)
2).hdfs-site.xml配置
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <final>true</final>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/usr/local/hadoop/namenode</value>
        <final>true</final>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/local/hadoop/datanode</value>
        <final>true</final>
    </property>
</configuration>

注意:

  • 我們後續搭建叢集環境時,將配置一個Master節點和兩個Slave節點。所以dfs.replication配置為2。
  • dfs.namenode.name.dirdfs.datanode.data.dir分別配置為之前建立的NameNode和DataNode的目錄路徑
3).mapred-site.xml配置
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:9001</value>
        <description>The host and port that the MapReduce job tracker runs
        at.  If "local", then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>
</configuration>
這裡只有一個配置項mapred.job.tracker,我們指向master節點機器。
格式化namenode
hadoop namenode -format
3、安裝ssh
sudo apt-get install ssh
在~/.bashrc中新增
#autorun
/usr/sbin/sshd
生成金鑰
cd ~/
ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
cd .ssh
cat id_dsa.pub >> authorized_keys
注:有時候會提示/var/run/sshd找不到,只要在run中建立一個sshd資料夾就行

進入到/etc/ssh的ssh_config中,新增

StrictHostKeyChecking no
UserKnownHostsFile /dev/null
</span>
4、生成安裝hadoop的映象
docker commit -m "hadoop install" hadoop ubuntu:hadoop


二、部署Hadoop分散式叢集
啟動master容器
docker run -d -ti -h master ubuntu:hadoop
啟動slave1容器
docker run -d -ti -h slave1 ubuntu:hadoop
啟動slave2容器
docker run -d -ti -h slave2 ubuntu:hadoop
在/etc/hosts中新增
10.0.0.5        master
10.0.0.6        slave1
10.0.0.7        slave2
在/usr/local/hadoop/etc/hadoop/slaves檔案中新增
slave1
slave2
注:由於虛擬機器記憶體不夠

mapred-site.xml中新增

<property>
       <name>mapreduce.map.memory.mb</name>
       <value>500</value>
</property>