1. 程式人生 > >偽分散式執行MapReduce(叢集配置,log日誌和namenode格式化,叢集操作)

偽分散式執行MapReduce(叢集配置,log日誌和namenode格式化,叢集操作)

目錄

叢集的啟動和配置

log日誌和namenode為何不能一直格式化?

 操作叢集(上傳,下載,執行MapReduce,查詢)


叢集的啟動和配置

#1,進入/opt/module/hadoop-2.7.2/etc/hadoop目錄,配置hadoop-env.sh 
[[email protected] hadoop]$ vim hadoop-env.sh 
*
*
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.
export JAVA_HOME=/opt/module/jdk1.8.0_144
*
*

#2,配置core-site.xml 
[
[email protected]
hadoop]$ vim core-site.xml <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop104:9000</value> </property> <!-- 指定Hadoop執行時產生檔案的儲存目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.2/data/tmp</value> </property> </configuration> #3,配置hdfs-site.xml [
[email protected]
hadoop]$ vim hdfs-site.xml <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定HDFS副本的數量 --> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> #4,格式化namenode,(第一次啟動之前格式化,以後就不用了) [
[email protected]
hadoop]$ hdfs namenode -format 18/11/14 20:07:27 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop104/192.168.1.104 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.7.2 * * 18/11/14 20:07:28 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop104/192.168.1.104 ************************************************************/ #5,分別啟動namenode 和 datanode,並檢視是否啟動成功 [[email protected] hadoop]$ hadoop-daemon.sh start namenode starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-namenode-hadoop104.out [[email protected] hadoop]$ hadoop-daemon.sh start datanode starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-datanode-hadoop104.out [[email protected] hadoop]$ jps 3427 NameNode 3517 DataNode 3598 Jps 到此,完成叢集的配置和啟動工作

接下來,我們訪問這個網址:

http://hadoop104:50070/dfshealth.html#tab-overview

會出現如下的內容

log日誌和namenode為何不能一直格式化?

#1,log日誌:
[[email protected] logs]$ pwd
/opt/module/hadoop-2.7.2/logs
[[email protected] logs]$ ll
總用量 60
-rw-rw-r--. 1 isea isea 23848 11月 14 20:10 hadoop-isea-datanode-hadoop104.log
-rw-rw-r--. 1 isea isea   715 11月 14 20:10 hadoop-isea-datanode-hadoop104.out
-rw-rw-r--. 1 isea isea 27519 11月 14 20:10 hadoop-isea-namenode-hadoop104.log
-rw-rw-r--. 1 isea isea   715 11月 14 20:10 hadoop-isea-namenode-hadoop104.out
-rw-rw-r--. 1 isea isea     0 11月 14 20:10 SecurityAuth-isea.audit

在啟動namenode 和 datanode的過程中會在hadoop目錄下產生log資料夾,在log資料夾中會產生日誌檔案,
和尾綴為out的檔案 和 一個安全認證的檔案。

#2,為什麼不能一直格式化namenode?
[[email protected] current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current
[[email protected] current]$ ll
總用量 8
drwx------. 4 isea isea 4096 11月 14 20:10 BP-847571129-192.168.1.104-1542197248436
-rw-rw-r--. 1 isea isea  229 11月 14 20:10 VERSION
[[email protected] current]$ cat VERSION 
#Wed Nov 14 20:10:52 CST 2018
storageID=DS-305b15b0-96c1-407c-b58e-1beb65922151
clusterID=CID-8eeb5d53-e49f-4de6-9e05-387a7eb1472f
cTime=0
datanodeUuid=ea5794eb-6929-40b7-b8c3-aad970d72c29
storageType=DATA_NODE
layoutVersion=-56
[[email protected] current]$ 

格式化NameNode,會產生新的叢集id,導致NameNode和DataNode的叢集id不一致,叢集找不到已往資料。
所以,格式NameNode時,一定要先刪除data資料和log日誌,然後再格式化NameNode

 操作叢集(上傳,下載,執行MapReduce,查詢)

#1,在HDFS檔案系統上建立一個input資料夾,並準備要上傳的資料
[[email protected] hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/isea/input
[[email protected] hadoop-2.7.2]$ vim wcinput/wc.input 

you know that i sea you
sea you
isea you
isea
i sea you

#2,上傳測試資料到HDFS檔案系統,並檢查是否上傳成功
[[email protected] hadoop-2.7.2]$ hdfs dfs -put wcinput/wc.input /user/isea/input/
[[email protected] hadoop-2.7.2]$ hdfs dfs -ls /user/isea/input/
Found 1 items
-rw-r--r--   1 isea supergroup         57 2018-11-14 20:45 /user/isea/input/wc.input

#3, 執行MapReduce程式,並檢查結果
[[email protected] hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/isea/input/ /user/isea/output
[[email protected] hadoop-2.7.2]$ hdfs dfs -cat /user/isea/output/*
i	2
isea	2
know	1
sea	3
that	1
you	5

集訓驗證操作叢集,從叢集中下載檔案,最後刪除HDFS的輸出檔案
[[email protected] hadoop-2.7.2]$ mkdir wcoutput
[[email protected] hadoop-2.7.2]$ hdfs dfs -get /user/isea/output/part-r-00000 ./wcoutput/
[[email protected] hadoop-2.7.2]$ cd wcoutput/
[[email protected] wcoutput]$ ll
總用量 4
-rw-r--r--. 1 isea isea 37 11月 14 21:21 part-r-00000
[[email protected] wcoutput]$ cat part-r-00000 
i	2
isea	2
know	1
sea	3
that	1
you	5
[[email protected] wcoutput]$ hdfs dfs -rm -r /user/isea/output
18/11/14 21:26:27 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/isea/output

此外,我們還可以在瀏覽器端驗證結果:

http://hadoop104:50070/explorer.html#/