1. 程式人生 > >【Hadoop】Hadoop 2.7.6安裝_偽分散式叢集

【Hadoop】Hadoop 2.7.6安裝_偽分散式叢集

本篇主要演示在Linux上安裝單節點Hadoop模式,以及偽分散式Hadoop模式。一 安裝環境
  • 作業系統:Oracle Linux Server release 6.5;
  • Java版本:java-1.7.0-openjdk-1.7.0.45;
  • Hadoop版本:hadoop-2.7.6;
二 安裝前準備1 建立hadoop使用者
[[email protected] ~]# useradd hadoop
[[email protected] ~]# usermod -a -G root hadoop
[[email protected] ~]# passwd hadoop
Changing password for user hadoop.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
2 安裝SSH,配置SSH免密碼登入1)檢查是否安裝SSH,若沒,則安裝;
[[email protected] ~]$ rpm -qa|grep ssh
openssh-server-5.3p1-94.el6.x86_64
openssh-5.3p1-94.el6.x86_64
libssh2-1.4.2-1.el6.x86_64
ksshaskpass-0.5.1-4.1.el6.x86_64
openssh-askpass-5.3p1-94.el6.x86_64
openssh-clients-5.3p1-94.el6.x86_64
2)配置SSH免密碼登入
[[email protected]
~]$ cd .ssh/ [[email protected] .ssh]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: 13:df:06:f2:ea:21:31:b2:c1:f8:13:24:c6:bf:45:05
[email protected]
The key's randomart image is: +--[ RSA 2048]----+ | E.. | | . . | | + . . o . | | . * . = o | | . * + S o o | | . B o o . | | = . o | | . o . | | . | +-----------------+ [[email protected] .ssh]$ cat id_rsa.pub >> authorized_keys [[email protected] .ssh]$ chmod 600 authorized_keys [[email protected] .ssh]$ ssh localhost Last login: Fri Jun 8 19:55:11 2018 from localhost
3 安裝JAVA
[[email protected] ~]# yum install java-1.7.0-openjdk*
在.bash_profile中新增以下內容:
[[email protected] ~]$ vim .bash_profile
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64
[[email protected] ~]$ . .bash_profile 
驗證JDK配置是否正確:
[[email protected] ~]$ java -version
java version "1.7.0_45"
OpenJDK Runtime Environment (rhel-2.4.3.3.0.1.el6-x86_64 u45-b15)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
[[email protected] ~]$ /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -version
java version "1.7.0_45"
OpenJDK Runtime Environment (rhel-2.4.3.3.0.1.el6-x86_64 u45-b15)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
注:Oracle Linux 6.5預設安裝的是Java JRE,而不是JDK,為開發方便,則需安裝JDK。三 安裝配置Hadoop1 下載Hadoop軟體
[[email protected] local]# ll hadoop-2.7.6.tar.gz 
-rw-r--r--. 1 root root 216745683 Jun  8 20:20 hadoop-2.7.6.tar.gz
2 解壓縮
[[email protected] local]# tar zxvf hadoop-2.7.6.tar.gz
[[email protected] local]# chown -R hadoop:hadoop hadoop-2.7.6
3 檢查Hadoop是否可用,成功則顯示版本資訊
[[email protected] local]# su - hadoop
[[email protected] ~]$ cd /usr/local/hadoop-2.7.6
[[email protected] hadoop-2.7.6]$ ./bin/hadoop version
Hadoop 2.7.6
Subversion https://[email protected]/repos/asf/hadoop.git -r 085099c66cf28be31604560c376fa282e69282b8
Compiled by kshvachk on 2018-04-18T01:33Z
Compiled with protoc 2.5.0
From source with checksum 71e2695531cb3360ab74598755d036
This command was run using /usr/local/hadoop-2.7.6/share/hadoop/common/hadoop-common-2.7.6.jar
4 設定JAVA_HOME變數在以下檔案修改JAVA_HOME
[[email protected] hadoop-2.7.6]$ vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64
5 單機配置(非分散式)Hadoop預設模式是非分散式模式,無須進行其他配置即可執行,非分散式即Java程序,方便進行除錯。1)執行wordcount示例經過上面四步的設定,已經完成了單機配置,現在演示單機配置下的示例,來體驗下Hadoop的功能:
[[email protected] hadoop-2.7.6]$ mkdir input
[[email protected] hadoop-2.7.6]$ vim ./input/test.txt
[[email protected] hadoop-2.7.6]$ cat ./input/test.txt 
Hello this is my first time to learn hadoop 
love hadoop 
Hello hadoop
[[email protected] hadoop-2.7.6]$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount input output
18/06/08 20:38:44 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/06/08 20:38:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/06/08 20:38:44 INFO input.FileInputFormat: Total input paths to process : 1
18/06/08 20:38:44 INFO mapreduce.JobSubmitter: number of splits:1
18/06/08 20:38:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local241325947_0001
18/06/08 20:38:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/06/08 20:38:47 INFO mapreduce.Job: Running job: job_local241325947_0001
18/06/08 20:38:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/06/08 20:38:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/06/08 20:38:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/06/08 20:38:48 INFO mapred.LocalJobRunner: Waiting for map tasks
18/06/08 20:38:48 INFO mapred.LocalJobRunner: Starting task: attempt_local241325947_0001_m_000000_0
18/06/08 20:38:48 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
----------------中間過程省略------------------
Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=71
	File Output Format Counters 
		Bytes Written=81
2)檢視執行結果
[[email protected] hadoop-2.7.6]$ cat output/*
Hello	2
first	1
hadoop	3
is	1
learn	1
love	1
my	1
this	1
time	1
to	1
注:Hadoop預設不會覆蓋結果檔案,再次執行會出錯,提示檔案存在,需先將output刪除。四 Hadoop偽分散式配置Hadoop可以在單節點以偽分散式模式執行,Hadoop程序以分離的Java程序執行,節點即作為NameNode,也作為DataNode,同時,讀取的是HDFS檔案。Hadoop的配置檔案位於 /usr/local/hadoop-2.7.6/etc/hadoop/中,偽分散式需要修改兩個配置檔案,分別為core-site.xml和hdfs-site.xml,配置檔案xml格式,每個配置以property的name和value設定。1 修改core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>
2 修改hdfs-site.xml
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>
3 執行NameNode格式化
[[email protected] hadoop-2.7.6]$ ./bin/hdfs namenode -format
18/06/08 20:57:45 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = strong.hadoop.com/192.168.56.102
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.6
--------------------中間過程省略------------------------
18/06/08 20:57:52 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/06/08 20:57:52 INFO util.GSet: VM type       = 64-bit
18/06/08 20:57:52 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
18/06/08 20:57:52 INFO util.GSet: capacity      = 2^15 = 32768 entries
18/06/08 20:57:52 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1934541033-192.168.56.102-1528462672607
18/06/08 20:57:52 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.------------表示格式化成功
18/06/08 20:57:53 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/06/08 20:57:53 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
18/06/08 20:57:53 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/06/08 20:57:53 INFO util.ExitUtil: Exiting with status 0  ------------表示格式化成功
18/06/08 20:57:53 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at strong.hadoop.com/192.168.56.102
************************************************************/
4 啟動NameNode和DataNode程序
[[email protected] hadoop-2.7.6]$ sbin/start-dfs.sh 
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.7.6/logs/hadoop-hadoop-namenode-strong.hadoop.com.out
localhost: starting datanode, logging to /usr/local/hadoop-2.7.6/logs/hadoop-hadoop-datanode-strong.hadoop.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 68:da:7d:9f:e5:46:14:fc:30:15:9e:24:3d:6e:a9:1d.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.6/logs/hadoop-hadoop-secondarynamenode-strong.hadoop.com.out
啟動成功後,檢視:
[[email protected] hadoop-2.7.6]$ jps
4034 DataNode
4346 Jps
3939 NameNode
4230 SecondaryNameNode
5 NameNode節點通過Web訪問
6 執行偽分散式例項在上面的本地模式,wordcount讀取的是本地例項,偽分散式讀取的這是HDFS上的資料,要使用HDFS,首先需要在HDFS中建立使用者目錄。1)建立使用者目錄
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -mkdir -p /user/hadoop
2)拷貝input檔案至分散式檔案系統將上面建立的檔案etc/test.txt複製到分散式檔案系統的/user/hadoop/input中,我們使用的是Hadoop使用者,並且已建立相應的使用者目錄/user/hadoop,因此在命令中可以使用相對路徑,其對應的絕對路徑為/user/hadoop/input。
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -put input/test.txt input
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -ls
Found 1 items
-rw-r--r--   1 hadoop supergroup         71 2018-06-08 21:21 input
3) 執行wordcount示例
[[email protected] hadoop-2.7.6]$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount input output
4)檢視執行結果
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -ls output
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2018-06-08 21:24 output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         69 2018-06-08 21:24 output/part-r-00000
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -cat output/*
Hello	2
first	1
hadoop	3
is	1
learn	1
love	1
my	1
this	1
time	1
to	1
5)Hadoop執行時,輸出目錄不能存在,否則會出錯,刪除output
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -ls output
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2018-06-08 21:24 output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         69 2018-06-08 21:24 output/part-r-00000
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -rm -r output
18/06/08 21:43:01 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted output
五 啟動YARNYARN是從MapReduce分離出來的,負責資源管理與任務排程,YARN執行位於MapReduce之上,提供了高可用性、高擴充套件性。1配置mapred-site.xml
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
2 配置yarn-site.xml
<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>
3 啟動ResourceManager程序和NodeManager程序1)啟動YARN
[[email protected] hadoop-2.7.6]$ sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.6/logs/yarn-hadoop-resourcemanager-strong.hadoop.com.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.7.6/logs/yarn-hadoop-nodemanager-strong.hadoop.com.out
2)開啟歷史伺服器,才能在web中檢視任務執行情況
[[email protected] hadoop-2.7.6]$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.6/logs/mapred-hadoop-historyserver-strong.hadoop.com.out
3)檢視YARN啟動
[[email protected] hadoop-2.7.6]$ jps
4034 DataNode
5007 ResourceManager
5422 JobHistoryServer
5494 Jps
3939 NameNode
5108 NodeManager
4230 SecondaryNameNode
4)再次執行wordcount示例
[[email protected] hadoop-2.7.6]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount input output
[[email protected] hadoop-2.7.6]$ ./bin/hdfs dfs -cat output/*
Hello	2
first	1
hadoop	3
is	1
learn	1
love	1
my	1
this	1
time	1
to	1
5)在瀏覽器檢視YARN
點選上圖History,出現如下介面:
至此,Hadoop偽分散式叢集安裝及配置完成,這個安裝只有HDFS、YARN,MapReduce等基本元件。