1. 程式人生 > >大資料作業(一)基於docker的hadoop叢集環境搭建

大資料作業(一)基於docker的hadoop叢集環境搭建


主要是根據廈門大學資料庫實驗室的教程(http://dblab.xmu.edu.cn/blog/1233/)在Ubuntu16.04環境下進行搭建。

一、安裝docker(Docker CE)

根據docker官網教程(https://docs.docker.com/install/linux/docker-ce/ubuntu/)教程進行安裝。
官方提供三種方式進行安裝,一種是從docker倉庫安裝,還有一種方式是從安裝包安裝,最後一種就是使用指令碼進行安裝,我選擇的是從docker倉庫安裝,這樣方便以後更新docker。

(一)設定軟體倉庫

1、首先升級現有軟體倉庫,更新包

$ sudo apt update

2、然後安裝以下所需軟體:

$ sudo apt install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

3、新增docker的官方GPG

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

通過以下命令確定key值為9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88

$ sudo apt-key fingerprint 0EBFCD88

pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint =
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 uid Docker Release (CE deb) <[email protected]> sub 4096R/F273FCD8 2017-02-22

4、docker擁有stable、edge、test三個版本,其中stable為穩定版,所以選擇安裝穩定版

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(
lsb_release -cs)
\ stable"

(二)安裝Docker CE

1、重新整理一下軟體源

$ sudo apt update

2、安裝最新版本的Docker CE

$ sudo apt-get install docker-ce

3、驗證是否安裝成功

$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete 
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

4、新增使用者許可權
由於docker預設只有root才能執行,所以還需要為當前使用者新增許可權

$ sudo usermod -aG docker zhangsl

然後登出系統之後再次登入,驗證許可權是否新增成功

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

二、在docker中安裝Ubuntu系統

首先是從docker hub上面拉取一個Ubuntu映象

$ docker pull ubuntu

然後驗證是否安裝成功

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED          SIZE
hello-world         latest              4ab4c602aa5e        2 weeks ago   1.84kB
ubuntu              latest              cd6d8154f1e1        2 weeks ago   84.1MB

在啟動映象時,需要一個資料夾向映象內部進行檔案傳輸,所以在家目錄下面新建一個檔案用於檔案傳輸

$ mkdir docker-ubuntu  

然後在docker上執行Ubuntu

$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu ubuntu
[email protected]:/# 

三、Ubuntu系統初始化

由於剛剛安裝好之後的系統是純淨系統,很多軟體都沒有裝,所以需要重新整理一下軟體源以及安裝一些必要的軟體。

(一)重新整理源

由於在docker 上面執行的Ubuntu預設登入的為root使用者,所以執行命令不需要sudo

[email protected]:/# apt update

(二)安裝一些必要的軟體

1、安裝Vim
終端中用到的文字編輯器有vim、emacs、nano等,個人比較習慣vim,所以才選擇vim

[email protected]:/# apt  install vim

2、安裝sshd
由於分散式需要用ssh連線到docker內的映象

[email protected]:/# apt  install ssh

然後在~/.bashrc內加入/etc/init.d/ssh start,保證每次啟動映象時都會自動啟動ssh服務,也可以使用service或者systemctl設定ssh服務自動啟動
然後就是配置ssh免密登入

[email protected]:~# ssh-keygen -t rsa #一直按回車鍵即可
[email protected]:~# cd .ssh
[email protected]:~/.ssh# cat id_dsa.pub >> authorized_keys

3、安裝jdk
由於Hadoop需要Java,因此還需要安裝jdk,由於預設的jdk為Java10,所以需要改成java8

[email protected]:~/# apt  install openjdk-8-jdk

接下來設定JAVA_HOMEPATH變數
只需要在~/.bashrc最後加入

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin

然後使~/.bashrc生效

[email protected]:~/# source ~/.bashrc

(三)儲存映象檔案

由於容器內的修改不會自動儲存,所以需要對容器進行一個儲存,首先登入docker

[email protected]:~$ docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: zhangshuoliang007
Password: 
WARNING! Your password will be stored unencrypted in /home/zhangsl/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

然後可以使用docker ps來儲存映象

[email protected]:~$ docker ps #檢視當前執行容器資訊
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
b59d716dbb4d        ubuntu              "/bin/bash"         About an hour ago   Up About an hour                        ubuntu
[email protected]:~$ docker commit b59d716dbb4d ubuntu/jdkinstalled #將id為b59d716dbb4d的容器儲存為一個新的映象,名為ubuntu/jdkinstalled
sha256:07a39087f9bcb985151ade3e225448556dd7df089477b69e5b71b600ad9634c6
[email protected]:~$ docker images #檢視當前所有映象
REPOSITORY            TAG                 IMAGE ID            CREATED        SIZE
ubuntu/jdkinstalled   latest     07a39087f9bc        3 minutes ago601MB
hello-world           latest       4ab4c602aa5e        2 weeks ago      1.84kB
ubuntu                latest        cd6d8154f1e1        2 weeks ago       84.1MB

四、安裝Hadoop

安裝hadoop有兩種方法,一種是從原始碼編譯安裝,另外一種是從官網下載二進位制檔案,為了方便,選擇從官網下載二進位制檔案。
首先開啟剛才儲存的映象

[email protected]:~$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu-jdkinstalled ubuntu/jdkinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# 
[email protected]:/# cd /root/docker-ubuntu
[email protected]::~/docker-ubuntu#tar -zxvf hadoop-2.9.1.tar.gz -C /usr/local

測試hadoop是否安裝成功

[email protected]:~/docker-ubuntu# cd /usr/local/hadoop-2.9.1/
[email protected]:/usr/local/hadoop-2.9.1# ls
LICENSE.txt  README.txt  etc      lib      sbin
NOTICE.txt   bin         include  libexec  share
[email protected]:/usr/local/hadoop-2.9.1# ./bin/hadoop version
Hadoop 2.9.1
Subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e
Compiled by root on 2018-04-16T09:33Z
Compiled with protoc 2.5.0
From source with checksum 7d6d2b655115c6cc336d662cc2b919bd
This command was run using /usr/local/hadoop-2.9.1/share/hadoop/common/hadoop-common-2.9.1.jar

四、配置Hadoop叢集

首先需要修改hadoop-env.sh中的JAVA_HOME

[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

接下來修改core-site.xml

[email protected]:/usr/local/hadoop-2.9.1# vim  etc/hadoop/core-site.xml 
<configuration>
      <property>
          <name>hadoop.tmp.dir</name>
          <value>file:/usr/local/hadoop-2.9.1/tmp</value>
          <description>Abase for other temporary directories.</description>
      </property>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://master:9000</value>
      </property>
</configuration>

然後修改hdfs-site.xml

[email protected]:/usr/local/hadoop-2.9.1# vim  etc/hadoop/hdfs-site.xml 
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop-2.9.1/namenode_dir</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop-2.9.1/datanode_dir</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
</configuration>

然後將mapred-site.xml.template複製為mapred-site.xml,然後進行修改

[email protected]:/usr/local/hadoop-2.9.1# cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml           
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>
</configuration>

最後修改yarn-site.xml

[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/yarn-site.xml
<configuration>
  <!-- Site specific YARN configuration properties -->
      <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
      </property>
      <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>master</value>
      </property>
</configuration>

現在叢集配置的已經差不多的,將現有的映象儲存一下

[email protected]:~$ docker commit 2ecf3c0dba0e ubuntu/hadoopinstalled
sha256:957de951c1d3093fa8e731bd63a6672706de2ca86d0dafb626dcb830536e774f

然後在三個終端上面開啟三個容器映象,分別代表叢集中的masterslave01slave02

# 第一個終端
[email protected]:~$docker run -it -h master --name master ubuntu/hadoopinstalled
# 第二個終端
[email protected]:~$docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
# 第三個終端
[email protected]:~$docker run -it -h slave02 --name slave02 ubuntu/hadoopinstallede

然後分別檢視他們的/etc/hosts檔案

[email protected]:~$ docker run -it -h master --name master ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.2	master

[email protected]:~$ docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.3	slave01

[email protected]:~$ docker run -it -h slave02 --name slave02 ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
[email protected]:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.4	slave02

最後把上述三個地址資訊分別複製到master,slave01slave02/etc/hosts即可,可以用如下命令來檢測下是否master是否可以連上slave01slave02

[email protected]:/# ssh slave01
The authenticity of host 'slave01 (172.17.0.3)' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave01,172.17.0.3' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                         [ OK ] 
[email protected]:~# exit
logout
Connection to slave01 closed.
[email protected]:/# ssh slave02
The authenticity of host 'slave02 (172.17.0.4)' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave02,172.17.0.4' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                         [ OK ] 
[email protected]:~# exit
logout
Connection to slave02 closed.

接下來是配置叢集的最後一步,開啟master上面的sslaves檔案,輸入slave0slave02

[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/slaves 

slave01
slave02 

這樣叢集就配置完成了,接下來是啟動
在master上面,進入/usr/local/hadoop-2.9.1,然後執行如下命令

[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs namenode -format
[email protected]:/usr/local/hadoop-2.9.1# sbin/start-all.sh

這個時候叢集已經啟動了,然後在master,slave01和slave02上分別執行命令jps檢視執行結果

五、執行Hadoop示例程式grep
因為要用到hdfs,所以先在hdfs上面建立一個目錄

[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -mkdir -p /user/hadoop/input

然後將/usr/local/hadoop-2.9.1/etc/hadoop/目錄下的所有檔案拷貝到hdfs上的目錄:

[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input

然後通過ls命令檢視下是否正確將檔案上傳到hdfs下:

[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input

[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input
Found 9 items
-rw-r--r--   3 root supergroup       7861 2018-09-24 11:54 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r--   3 root supergroup       1036 2018-09-24 11:54 /user/hadoop/input/core-site.xml
-rw-r--r--   3 root supergroup      10206 2018-09-24 11:54 /user/hadoop/input/hadoop-policy.xml
-rw-r--r--   3 root supergroup       1091 2018-09-24 11:54 /user/hadoop/input/hdfs-site.xml
-rw-r--r--   3 root supergroup        620 2018-09-24 11:54 /user/hadoop/input/httpfs-site.xml
-rw-r--r--   3 root supergroup       3518 2018-09-24 11:54 /user/hadoop/input/kms-acls.xml
-rw-r--r--   3 root supergroup       5939 2018-09-24 11:54 /user/hadoop/input/kms-site.xml
-rw-r--r--   3 root supergroup        844 2018-09-24 11:54 /user/hadoop/input/mapred-site.xml
-rw-r--r--   3 root supergroup        942 2018-09-24 11:54 /user/hadoop/input/yarn-site.xml

接下來,通過執行下面命令執行例項程式:

[email protected]:/usr/local/hadoop-2.9.1# bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
adoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
18/09/24 11:57:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
18/09/24 11:57:20 INFO input.FileInputFormat: Total input files to process : 9
18/09/24 11:57:20 INFO mapreduce.JobSubmitter: number of splits:9
18/09/24 11:57:21 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/09/24 11:57:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0001
18/09/24 11:57:21 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0001
18/09/24 11:57:21 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0001/
18/09/24 11:57:21 INFO mapreduce.Job: Running job: job_1537789095052_0001
18/09/24 11:57:26 INFO mapreduce.Job: Job job_1537789095052_0001 running in uber mode : false
18/09/24 11:57:26 INFO mapreduce.Job:  map 0% reduce 0%
18/09/24 11:57:34 INFO mapreduce.Job:  map 89% reduce 0%
18/09/24 11:57:35 INFO mapreduce.Job:  map 100% reduce 0%
18/09/24 11:57:39 INFO mapreduce.Job:  map 100% reduce 100%
18/09/24 11:57:41 INFO mapreduce.Job: Job job_1537789095052_0001 completed successfully
18/09/24 11:57:41 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=115
		FILE: Number of bytes written=1979213
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=33107
		HDFS: Number of bytes written=219
		HDFS: Number of read operations=30
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=1
		Launched map tasks=9
		Launched reduce tasks=1
		Data-local map tasks=9
		Total time spent by all maps in occupied slots (ms)=51634
		Total time spent by all reduces in occupied slots (ms)=2287
		Total time spent by all map tasks (ms)=51634
		Total time spent by all reduce tasks (ms)=2287
		Total vcore-milliseconds taken by all map tasks=51634
		Total vcore-milliseconds taken by all reduce tasks=2287
		Total megabyte-milliseconds taken by all map tasks=52873216
		Total megabyte-milliseconds taken by all reduce tasks=2341888
	Map-Reduce Framework
		Map input records=891
		Map output records=4
		Map output bytes=101
		Map output materialized bytes=163
		Input split bytes=1050
		Combine input records=4
		Combine output records=4
		Reduce input groups=4
		Reduce shuffle bytes=163
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =9
		Failed Shuffles=0
		Merged Map outputs=9
		GC time elapsed (ms)=1378
		CPU time spent (ms)=2880
		Physical memory (bytes) snapshot=2824376320
		Virtual memory (bytes) snapshot=19761373184
		Total committed heap usage (bytes)=1956642816
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=32057
	File Output Format Counters 
		Bytes Written=219
18/09/24 11:57:41 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
18/09/24 11:57:41 INFO input.FileInputFormat: Total input files to process : 1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: number of splits:1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0002
18/09/24 11:57:41 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0002
18/09/24 11:57:41 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0002/
18/09/24 11:57:41 INFO mapreduce.Job: Running job: job_1537789095052_0002
18/09/24 11:57:50 INFO mapreduce.Job: Job job_1537789095052_0002 running in uber mode : false
18/09/24 11:57:50 INFO mapreduce.Job:  map 0% reduce 0%
18/09/24 11:57:54 INFO mapreduce.Job:  map 100% reduce 0%
18/09/24 11:57:58 INFO mapreduce.Job:  map 100% reduce 100%
18/09/24 11:57:59 INFO mapreduce.Job: Job job_1537789095052_0002 completed successfully
18/09/24 11:58:00 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=115
		FILE: Number of bytes written=394779
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=346
		HDFS: Number of bytes written=77
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1826
		Total time spent by all reduces in occupied slots (ms)=1917
		Total time spent by all map tasks (ms)=1826
		Total time spent by all reduce tasks (ms)=1917
		Total vcore-milliseconds taken by all map tasks=1826
		Total vcore-milliseconds taken by all reduce tasks=1917
		Total megabyte-milliseconds taken by all map tasks=1869824
		Total megabyte-milliseconds taken by all reduce tasks=1963008
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Map output bytes=101
		Map output materialized bytes=115
		Input split bytes=127
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=115
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=54
		CPU time spent (ms)=590
		Physical memory (bytes) snapshot=488009728
		Virtual memory (bytes) snapshot=3967393792
		Total committed heap usage (bytes)=344981504
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=219
	File Output Format Counters 
		Bytes Written=77

等這個程式執行結束之後,就可以在hdfs上的output目錄下檢視到執行結果:

[email protected]:/usr/local/hadoop-2.9.1#  bin/hdfs dfs -cat output/*
1	dfsadmin
1	dfs.replication
1	dfs.namenode.name.dir
1	dfs.datanode.data.dir

hdfs檔案上的output目錄下,輸出程式正確的執行結果,hadoop分散式叢集順利執行grep程式;