大資料作業(一)基於docker的hadoop叢集環境搭建
主要是根據廈門大學資料庫實驗室的教程(http://dblab.xmu.edu.cn/blog/1233/)在Ubuntu16.04環境下進行搭建。
一、安裝docker(Docker CE)
根據docker官網教程(https://docs.docker.com/install/linux/docker-ce/ubuntu/)教程進行安裝。
官方提供三種方式進行安裝,一種是從docker倉庫安裝,還有一種方式是從安裝包安裝,最後一種就是使用指令碼進行安裝,我選擇的是從docker倉庫安裝,這樣方便以後更新docker。
(一)設定軟體倉庫
1、首先升級現有軟體倉庫,更新包
$ sudo apt update
2、然後安裝以下所需軟體:
$ sudo apt install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
3、新增docker的官方GPG
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
通過以下命令確定key值為9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88
$ sudo apt-key fingerprint 0EBFCD88
pub 4096R/0EBFCD88 2017-02-22
Key fingerprint = 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88
uid Docker Release (CE deb) <[email protected]>
sub 4096R/F273FCD8 2017-02-22
4、docker擁有stable、edge、test三個版本,其中stable為穩定版,所以選擇安裝穩定版
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$( lsb_release -cs) \
stable"
(二)安裝Docker CE
1、重新整理一下軟體源
$ sudo apt update
2、安裝最新版本的Docker CE
$ sudo apt-get install docker-ce
3、驗證是否安裝成功
$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
4、新增使用者許可權
由於docker預設只有root才能執行,所以還需要為當前使用者新增許可權
$ sudo usermod -aG docker zhangsl
然後登出系統之後再次登入,驗證許可權是否新增成功
$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
二、在docker中安裝Ubuntu系統
首先是從docker hub上面拉取一個Ubuntu映象
$ docker pull ubuntu
然後驗證是否安裝成功
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest 4ab4c602aa5e 2 weeks ago 1.84kB
ubuntu latest cd6d8154f1e1 2 weeks ago 84.1MB
在啟動映象時,需要一個資料夾向映象內部進行檔案傳輸,所以在家目錄下面新建一個檔案用於檔案傳輸
$ mkdir docker-ubuntu
然後在docker上執行Ubuntu
$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu ubuntu
[email protected]:/#
三、Ubuntu系統初始化
由於剛剛安裝好之後的系統是純淨系統,很多軟體都沒有裝,所以需要重新整理一下軟體源以及安裝一些必要的軟體。
(一)重新整理源
由於在docker 上面執行的Ubuntu預設登入的為root使用者,所以執行命令不需要sudo
[email protected]:/# apt update
(二)安裝一些必要的軟體
1、安裝Vim
終端中用到的文字編輯器有vim、emacs、nano等,個人比較習慣vim,所以才選擇vim
[email protected]:/# apt install vim
2、安裝sshd
由於分散式需要用ssh連線到docker內的映象
[email protected]:/# apt install ssh
然後在~/.bashr
c內加入/etc/init.d/ssh start
,保證每次啟動映象時都會自動啟動ssh服務,也可以使用service
或者systemctl
設定ssh服務自動啟動
然後就是配置ssh免密登入
[email protected]:~# ssh-keygen -t rsa #一直按回車鍵即可
[email protected]:~# cd .ssh
[email protected]:~/.ssh# cat id_dsa.pub >> authorized_keys
3、安裝jdk
由於Hadoop需要Java,因此還需要安裝jdk,由於預設的jdk為Java10,所以需要改成java8
[email protected]:~/# apt install openjdk-8-jdk
接下來設定JAVA_HOME
於PATH
變數
只需要在~/.bashrc
最後加入
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin
然後使~/.bashrc
生效
[email protected]:~/# source ~/.bashrc
(三)儲存映象檔案
由於容器內的修改不會自動儲存,所以需要對容器進行一個儲存,首先登入docker
[email protected]:~$ docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: zhangshuoliang007
Password:
WARNING! Your password will be stored unencrypted in /home/zhangsl/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
然後可以使用docker ps
來儲存映象
[email protected]:~$ docker ps #檢視當前執行容器資訊
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b59d716dbb4d ubuntu "/bin/bash" About an hour ago Up About an hour ubuntu
[email protected]:~$ docker commit b59d716dbb4d ubuntu/jdkinstalled #將id為b59d716dbb4d的容器儲存為一個新的映象,名為ubuntu/jdkinstalled
sha256:07a39087f9bcb985151ade3e225448556dd7df089477b69e5b71b600ad9634c6
[email protected]:~$ docker images #檢視當前所有映象
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu/jdkinstalled latest 07a39087f9bc 3 minutes ago601MB
hello-world latest 4ab4c602aa5e 2 weeks ago 1.84kB
ubuntu latest cd6d8154f1e1 2 weeks ago 84.1MB
四、安裝Hadoop
安裝hadoop有兩種方法,一種是從原始碼編譯安裝,另外一種是從官網下載二進位制檔案,為了方便,選擇從官網下載二進位制檔案。
首先開啟剛才儲存的映象
[email protected]:~$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu-jdkinstalled ubuntu/jdkinstalled
* Starting OpenBSD Secure Shell server sshd [ OK ]
[email protected]:/#
[email protected]:/# cd /root/docker-ubuntu
[email protected]::~/docker-ubuntu#tar -zxvf hadoop-2.9.1.tar.gz -C /usr/local
測試hadoop是否安裝成功
[email protected]:~/docker-ubuntu# cd /usr/local/hadoop-2.9.1/
[email protected]:/usr/local/hadoop-2.9.1# ls
LICENSE.txt README.txt etc lib sbin
NOTICE.txt bin include libexec share
[email protected]:/usr/local/hadoop-2.9.1# ./bin/hadoop version
Hadoop 2.9.1
Subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e
Compiled by root on 2018-04-16T09:33Z
Compiled with protoc 2.5.0
From source with checksum 7d6d2b655115c6cc336d662cc2b919bd
This command was run using /usr/local/hadoop-2.9.1/share/hadoop/common/hadoop-common-2.9.1.jar
四、配置Hadoop叢集
首先需要修改hadoop-env.sh
中的JAVA_HOME
[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
接下來修改core-site.xml
[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop-2.9.1/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
然後修改hdfs-site.xml
[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-2.9.1/namenode_dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop-2.9.1/datanode_dir</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
然後將mapred-site.xml.template
複製為mapred-site.xml
,然後進行修改
[email protected]:/usr/local/hadoop-2.9.1# cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
最後修改yarn-site.xml
[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
現在叢集配置的已經差不多的,將現有的映象儲存一下
[email protected]:~$ docker commit 2ecf3c0dba0e ubuntu/hadoopinstalled
sha256:957de951c1d3093fa8e731bd63a6672706de2ca86d0dafb626dcb830536e774f
然後在三個終端上面開啟三個容器映象,分別代表叢集中的master
、slave01
、slave02
# 第一個終端
[email protected]:~$docker run -it -h master --name master ubuntu/hadoopinstalled
# 第二個終端
[email protected]:~$docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
# 第三個終端
[email protected]:~$docker run -it -h slave02 --name slave02 ubuntu/hadoopinstallede
然後分別檢視他們的/etc/hosts
檔案
[email protected]:~$ docker run -it -h master --name master ubuntu/hadoopinstalled
* Starting OpenBSD Secure Shell server sshd [ OK ]
[email protected]:/# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2 master
[email protected]:~$ docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
* Starting OpenBSD Secure Shell server sshd [ OK ]
[email protected]:/# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 slave01
[email protected]:~$ docker run -it -h slave02 --name slave02 ubuntu/hadoopinstalled
* Starting OpenBSD Secure Shell server sshd [ OK ]
[email protected]:/# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.4 slave02
最後把上述三個地址資訊分別複製到master
,slave01
和slave02
的/etc/hosts
即可,可以用如下命令來檢測下是否master
是否可以連上slave01
和slave02
[email protected]:/# ssh slave01
The authenticity of host 'slave01 (172.17.0.3)' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave01,172.17.0.3' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
* Starting OpenBSD Secure Shell server sshd [ OK ]
[email protected]:~# exit
logout
Connection to slave01 closed.
[email protected]:/# ssh slave02
The authenticity of host 'slave02 (172.17.0.4)' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave02,172.17.0.4' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
* Starting OpenBSD Secure Shell server sshd [ OK ]
[email protected]:~# exit
logout
Connection to slave02 closed.
接下來是配置叢集的最後一步,開啟master上
面的sslaves
檔案,輸入slave0
和slave02
[email protected]:/usr/local/hadoop-2.9.1# vim etc/hadoop/slaves
slave01
slave02
這樣叢集就配置完成了,接下來是啟動
在master上面,進入/usr/local/hadoop-2.9.1
,然後執行如下命令
[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs namenode -format
[email protected]:/usr/local/hadoop-2.9.1# sbin/start-all.sh
這個時候叢集已經啟動了,然後在master,slave01和slave02上分別執行命令jps檢視執行結果
五、執行Hadoop示例程式grep
因為要用到hdfs,所以先在hdfs上面建立一個目錄
[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -mkdir -p /user/hadoop/input
然後將/usr/local/hadoop-2.9.1/etc/hadoop/
目錄下的所有檔案拷貝到hdfs上的目錄:
[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input
然後通過ls命令檢視下是否正確將檔案上傳到hdfs下:
[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input
[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input
Found 9 items
-rw-r--r-- 3 root supergroup 7861 2018-09-24 11:54 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r-- 3 root supergroup 1036 2018-09-24 11:54 /user/hadoop/input/core-site.xml
-rw-r--r-- 3 root supergroup 10206 2018-09-24 11:54 /user/hadoop/input/hadoop-policy.xml
-rw-r--r-- 3 root supergroup 1091 2018-09-24 11:54 /user/hadoop/input/hdfs-site.xml
-rw-r--r-- 3 root supergroup 620 2018-09-24 11:54 /user/hadoop/input/httpfs-site.xml
-rw-r--r-- 3 root supergroup 3518 2018-09-24 11:54 /user/hadoop/input/kms-acls.xml
-rw-r--r-- 3 root supergroup 5939 2018-09-24 11:54 /user/hadoop/input/kms-site.xml
-rw-r--r-- 3 root supergroup 844 2018-09-24 11:54 /user/hadoop/input/mapred-site.xml
-rw-r--r-- 3 root supergroup 942 2018-09-24 11:54 /user/hadoop/input/yarn-site.xml
接下來,通過執行下面命令執行例項程式:
[email protected]:/usr/local/hadoop-2.9.1# bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
adoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
18/09/24 11:57:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
18/09/24 11:57:20 INFO input.FileInputFormat: Total input files to process : 9
18/09/24 11:57:20 INFO mapreduce.JobSubmitter: number of splits:9
18/09/24 11:57:21 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/09/24 11:57:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0001
18/09/24 11:57:21 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0001
18/09/24 11:57:21 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0001/
18/09/24 11:57:21 INFO mapreduce.Job: Running job: job_1537789095052_0001
18/09/24 11:57:26 INFO mapreduce.Job: Job job_1537789095052_0001 running in uber mode : false
18/09/24 11:57:26 INFO mapreduce.Job: map 0% reduce 0%
18/09/24 11:57:34 INFO mapreduce.Job: map 89% reduce 0%
18/09/24 11:57:35 INFO mapreduce.Job: map 100% reduce 0%
18/09/24 11:57:39 INFO mapreduce.Job: map 100% reduce 100%
18/09/24 11:57:41 INFO mapreduce.Job: Job job_1537789095052_0001 completed successfully
18/09/24 11:57:41 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=115
FILE: Number of bytes written=1979213
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=33107
HDFS: Number of bytes written=219
HDFS: Number of read operations=30
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=9
Launched reduce tasks=1
Data-local map tasks=9
Total time spent by all maps in occupied slots (ms)=51634
Total time spent by all reduces in occupied slots (ms)=2287
Total time spent by all map tasks (ms)=51634
Total time spent by all reduce tasks (ms)=2287
Total vcore-milliseconds taken by all map tasks=51634
Total vcore-milliseconds taken by all reduce tasks=2287
Total megabyte-milliseconds taken by all map tasks=52873216
Total megabyte-milliseconds taken by all reduce tasks=2341888
Map-Reduce Framework
Map input records=891
Map output records=4
Map output bytes=101
Map output materialized bytes=163
Input split bytes=1050
Combine input records=4
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=163
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =9
Failed Shuffles=0
Merged Map outputs=9
GC time elapsed (ms)=1378
CPU time spent (ms)=2880
Physical memory (bytes) snapshot=2824376320
Virtual memory (bytes) snapshot=19761373184
Total committed heap usage (bytes)=1956642816
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=32057
File Output Format Counters
Bytes Written=219
18/09/24 11:57:41 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
18/09/24 11:57:41 INFO input.FileInputFormat: Total input files to process : 1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: number of splits:1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0002
18/09/24 11:57:41 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0002
18/09/24 11:57:41 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0002/
18/09/24 11:57:41 INFO mapreduce.Job: Running job: job_1537789095052_0002
18/09/24 11:57:50 INFO mapreduce.Job: Job job_1537789095052_0002 running in uber mode : false
18/09/24 11:57:50 INFO mapreduce.Job: map 0% reduce 0%
18/09/24 11:57:54 INFO mapreduce.Job: map 100% reduce 0%
18/09/24 11:57:58 INFO mapreduce.Job: map 100% reduce 100%
18/09/24 11:57:59 INFO mapreduce.Job: Job job_1537789095052_0002 completed successfully
18/09/24 11:58:00 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=115
FILE: Number of bytes written=394779
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=346
HDFS: Number of bytes written=77
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1826
Total time spent by all reduces in occupied slots (ms)=1917
Total time spent by all map tasks (ms)=1826
Total time spent by all reduce tasks (ms)=1917
Total vcore-milliseconds taken by all map tasks=1826
Total vcore-milliseconds taken by all reduce tasks=1917
Total megabyte-milliseconds taken by all map tasks=1869824
Total megabyte-milliseconds taken by all reduce tasks=1963008
Map-Reduce Framework
Map input records=4
Map output records=4
Map output bytes=101
Map output materialized bytes=115
Input split bytes=127
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=115
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=54
CPU time spent (ms)=590
Physical memory (bytes) snapshot=488009728
Virtual memory (bytes) snapshot=3967393792
Total committed heap usage (bytes)=344981504
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=219
File Output Format Counters
Bytes Written=77
等這個程式執行結束之後,就可以在hdfs上的output目錄下檢視到執行結果:
[email protected]:/usr/local/hadoop-2.9.1# bin/hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir
hdfs檔案上的output
目錄下,輸出程式正確的執行結果,hadoop分散式叢集順利執行grep程式;