1. 程式人生 > >大資料作業(一)基於docker的hadoop叢集環境搭建



一、安裝docker(Docker CE)




$ sudo apt update


$ sudo apt install \
    apt-transport-https \
    ca-certificates \
    curl \


$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

通過以下命令確定key值為9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88

$ sudo apt-key fingerprint 0EBFCD88

pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint =
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 uid Docker Release (CE deb) <[email protected]> sub 4096R/F273FCD8 2017-02-22


$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
lsb_release -cs)
\ stable"

(二)安裝Docker CE


$ sudo apt update

2、安裝最新版本的Docker CE

$ sudo apt-get install docker-ce


$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete 
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:

For more examples and ideas, visit:


$ sudo usermod -aG docker zhangsl


$ docker run hello-world

首先是從docker hub上面拉取一個Ubuntu映象

$ docker pull ubuntu


$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED          SIZE
hello-world         latest              4ab4c602aa5e        2 weeks ago   1.84kB
ubuntu              latest              cd6d8154f1e1        2 weeks ago   84.1MB


$ mkdir docker-ubuntu  


$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu ubuntu
[email protected]:/# 




由於在docker 上面執行的Ubuntu預設登入的為root使用者,所以執行命令不需要sudo

[email protected]:/# apt update



[email protected]:/# apt  install vim


[email protected]:/# apt  install ssh

然後在~/.bashrc內加入/etc/init.d/ssh start,保證每次啟動映象時都會自動啟動ssh服務,也可以使用service或者systemctl設定ssh服務自動啟動

[email protected]:~# ssh-keygen -t rsa #一直按回車鍵即可
[email protected]:~# cd .ssh
[email protected]:~/.ssh# cat id_dsa.pub >> authorized_keys


[email protected]:~/# apt  install openjdk-8-jdk


export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin


[email protected]:~/# source ~/.bashrc



[email protected]:~$ docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: zhangshuoliang007
WARNING! Your password will be stored unencrypted in /home/zhangsl/.docker/config.json.
Configure a credential helper to remove this warning. See

Login Succeeded

然後可以使用docker ps來儲存映象

[email protected]:~$ docker ps #檢視當前執行容器資訊
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
b59d716dbb4d        ubuntu              "/bin/bash"         About an hour ago   Up About an hour                        ubuntu
[email protected]:~$ docker commit b59d716dbb4d ubuntu/jdkinstalled #將id為b59d716dbb4d的容器儲存為一個新的映象,名為ubuntu/jdkinstalled
[email protected]:~$ docker images #檢視當前所有映象
REPOSITORY            TAG                 IMAGE ID            CREATED        SIZE
ubuntu/jdkinstalled   latest     07a39087f9bc        3 minutes ago601MB
hello-world           latest       4ab4c602aa5e        2 weeks ago      1.84kB
ubuntu                latest        cd6d8154f1e1        2 weeks ago       84.1MB



[email protected]:~$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu-jdkinstalled ubuntu/jdkinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# 
[email protected]:/# cd /root/docker-ubuntu
[email protected]::~/docker-ubuntu#tar -zxvf hadoop-2.9.1.tar.gz -C /usr/local


[email protected]:~/docker-ubuntu# cd /usr/local/hadoop-2.9.1/
[email protected]:/usr/local/hadoop-2.9.1# ls
LICENSE.txt  README.txt  etc      lib      sbin
NOTICE.txt   bin         include  libexec  share
[email protected]:/usr/local/hadoop-2.9.1# ./bin/hadoop version
Hadoop 2.9.1
Subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e
Compiled by root on 2018-04-16T09:33Z
Compiled with protoc 2.5.0
From source with checksum 7d6d2b655115c6cc336d662cc2b919bd
This command was run using /usr/local/hadoop-2.9.1/share/hadoop/common/hadoop-common-2.9.1.jar



[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/


[email protected]:/usr/local/hadoop-2.9.1# vim  etc/hadoop/core-site.xml 
          <description>Abase for other temporary directories.</description>


[email protected]:/usr/local/hadoop-2.9.1# vim  etc/hadoop/hdfs-site.xml 


[email protected]:/usr/local/hadoop-2.9.1# cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml           


[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/yarn-site.xml
  <!-- Site specific YARN configuration properties -->


[email protected]:~$ docker commit 2ecf3c0dba0e ubuntu/hadoopinstalled


# 第一個終端
[email protected]:~$docker run -it -h master --name master ubuntu/hadoopinstalled
# 第二個終端
[email protected]:~$docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
# 第三個終端
[email protected]:~$docker run -it -h slave02 --name slave02 ubuntu/hadoopinstallede


[email protected]:~$ docker run -it -h master --name master ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# cat /etc/hosts	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters	master

[email protected]:~$ docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# cat /etc/hosts	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters	slave01

[email protected]:~$ docker run -it -h slave02 --name slave02 ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# cat /etc/hosts	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters	slave02


[email protected]:/# ssh slave01
The authenticity of host 'slave01 (' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave01,' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                         [ OK ] 
[email protected]:~# exit
Connection to slave01 closed.
[email protected]:/# ssh slave02
The authenticity of host 'slave02 (' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave02,' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                         [ OK ] 
[email protected]:~# exit
Connection to slave02 closed.


[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/slaves 



[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs namenode -format
[email protected]:/usr/local/hadoop-2.9.1# sbin/start-all.sh



[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -mkdir -p /user/hadoop/input


[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input


[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input

[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input
Found 9 items
-rw-r--r--   3 root supergroup       7861 2018-09-24 11:54 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r--   3 root supergroup       1036 2018-09-24 11:54 /user/hadoop/input/core-site.xml
-rw-r--r--   3 root supergroup      10206 2018-09-24 11:54 /user/hadoop/input/hadoop-policy.xml
-rw-r--r--   3 root supergroup       1091 2018-09-24 11:54 /user/hadoop/input/hdfs-site.xml
-rw-r--r--   3 root supergroup        620 2018-09-24 11:54 /user/hadoop/input/httpfs-site.xml
-rw-r--r--   3 root supergroup       3518 2018-09-24 11:54 /user/hadoop/input/kms-acls.xml
-rw-r--r--   3 root supergroup       5939 2018-09-24 11:54 /user/hadoop/input/kms-site.xml
-rw-r--r--   3 root supergroup        844 2018-09-24 11:54 /user/hadoop/input/mapred-site.xml
-rw-r--r--   3 root supergroup        942 2018-09-24 11:54 /user/hadoop/input/yarn-site.xml


[email protected]:/usr/local/hadoop-2.9.1# bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
adoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
18/09/24 11:57:19 INFO client.RMProxy: Connecting to ResourceManager at master/
18/09/24 11:57:20 INFO input.FileInputFormat: Total input files to process : 9
18/09/24 11:57:20 INFO mapreduce.JobSubmitter: number of splits:9
18/09/24 11:57:21 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/09/24 11:57:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0001
18/09/24 11:57:21 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0001
18/09/24 11:57:21 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0001/
18/09/24 11:57:21 INFO mapreduce.Job: Running job: job_1537789095052_0001
18/09/24 11:57:26 INFO mapreduce.Job: Job job_1537789095052_0001 running in uber mode : false
18/09/24 11:57:26 INFO mapreduce.Job:  map 0% reduce 0%
18/09/24 11:57:34 INFO mapreduce.Job:  map 89% reduce 0%
18/09/24 11:57:35 INFO mapreduce.Job:  map 100% reduce 0%
18/09/24 11:57:39 INFO mapreduce.Job:  map 100% reduce 100%
18/09/24 11:57:41 INFO mapreduce.Job: Job job_1537789095052_0001 completed successfully
18/09/24 11:57:41 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=115
		FILE: Number of bytes written=1979213
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=33107
		HDFS: Number of bytes written=219
		HDFS: Number of read operations=30
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=1
		Launched map tasks=9
		Launched reduce tasks=1
		Data-local map tasks=9
		Total time spent by all maps in occupied slots (ms)=51634
		Total time spent by all reduces in occupied slots (ms)=2287
		Total time spent by all map tasks (ms)=51634
		Total time spent by all reduce tasks (ms)=2287
		Total vcore-milliseconds taken by all map tasks=51634
		Total vcore-milliseconds taken by all reduce tasks=2287
		Total megabyte-milliseconds taken by all map tasks=52873216
		Total megabyte-milliseconds taken by all reduce tasks=2341888
	Map-Reduce Framework
		Map input records=891
		Map output records=4
		Map output bytes=101
		Map output materialized bytes=163
		Input split bytes=1050
		Combine input records=4
		Combine output records=4
		Reduce input groups=4
		Reduce shuffle bytes=163
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =9
		Failed Shuffles=0
		Merged Map outputs=9
		GC time elapsed (ms)=1378
		CPU time spent (ms)=2880
		Physical memory (bytes) snapshot=2824376320
		Virtual memory (bytes) snapshot=19761373184
		Total committed heap usage (bytes)=1956642816
	Shuffle Errors
	File Input Format Counters 
		Bytes Read=32057
	File Output Format Counters 
		Bytes Written=219
18/09/24 11:57:41 INFO client.RMProxy: Connecting to ResourceManager at master/
18/09/24 11:57:41 INFO input.FileInputFormat: Total input files to process : 1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: number of splits:1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0002
18/09/24 11:57:41 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0002
18/09/24 11:57:41 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0002/
18/09/24 11:57:41 INFO mapreduce.Job: Running job: job_1537789095052_0002
18/09/24 11:57:50 INFO mapreduce.Job: Job job_1537789095052_0002 running in uber mode : false
18/09/24 11:57:50 INFO mapreduce.Job:  map 0% reduce 0%
18/09/24 11:57:54 INFO mapreduce.Job:  map 100% reduce 0%
18/09/24 11:57:58 INFO mapreduce.Job:  map 100% reduce 100%
18/09/24 11:57:59 INFO mapreduce.Job: Job job_1537789095052_0002 completed successfully
18/09/24 11:58:00 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=115
		FILE: Number of bytes written=394779
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=346
		HDFS: Number of bytes written=77
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1826
		Total time spent by all reduces in occupied slots (ms)=1917
		Total time spent by all map tasks (ms)=1826
		Total time spent by all reduce tasks (ms)=1917
		Total vcore-milliseconds taken by all map tasks=1826
		Total vcore-milliseconds taken by all reduce tasks=1917
		Total megabyte-milliseconds taken by all map tasks=1869824
		Total megabyte-milliseconds taken by all reduce tasks=1963008
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Map output bytes=101
		Map output materialized bytes=115
		Input split bytes=127
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=115
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=54
		CPU time spent (ms)=590
		Physical memory (bytes) snapshot=488009728
		Virtual memory (bytes) snapshot=3967393792
		Total committed heap usage (bytes)=344981504
	Shuffle Errors
	File Input Format Counters 
		Bytes Read=219
	File Output Format Counters 
		Bytes Written=77


[email protected]:/usr/local/hadoop-2.9.1#  bin/hdfs dfs -cat output/*
1	dfsadmin
1	dfs.replication
1	dfs.namenode.name.dir
1	dfs.datanode.data.dir
