《深入理解大資料-大資料處理與編輯實踐》hadoop1.2.1安裝
【第一部分】《深入理解大資料》一書的原始碼
【第二部分】安裝hadoop1.2.1安裝
【1】安裝java程式
jdk-6u45-linux-i586-rpm.rar 解壓後為 jdk-6u45-linux-i586-rpm.bin
安裝執行 ./jdk-6u45-linux-i586-rpm.bin
安裝成功後目錄為 /usr/java/jdk1.6.0_45
A22811459:/usr/java/jdk1.6.0_45 # pwd
/usr/java/jdk1.6.0_45
A22811459:/usr/java/jdk1.6.0_45 # ls
COPYRIGHT LICENSE README.html THIRDPARTYLICENSEREADME.txt bin include jre lib man src.zip
【1.2】在系統中/etc/profile新增java路徑,便於呼叫
#set java
export JAVA_HOME=/usr/java/jdk1.6.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
【1.3】讓配置生效
# source /etc/profile
【1.4】檢視java版本,說明安裝成功
A22811459:/usr/java/jdk1.6.0_45 # java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode
【1.5】可以寫一個簡單的java程式進行編譯執行,進一步確保java安裝成功
HelloWel.java
public class HelloWel {
public static void main(String[] args)
{
System.out.println("JAVA OK");
}
}
編譯和執行
# javac HelloWel.java
# java HelloWel
JAVA OK
至此可百分百確保Java安裝沒有問題,java路徑(後面會用到)為 /usr/java/jdk1.6.0_45
【2】hadoop1.2.1安裝 參考《深入理解大資料》
【2.1】建立hadoop使用者
#groupadd hadoop-user
#useradd -g hadoop-user hadoop
#passwd hadoop
【2.2】配置SSH
#ssh-keygen -t rsa
# cd /root/.ssh/
#cp id_rsa.pub authorized_keys
#ssh localhost
檢視結果
# ls
authorized_keys id_rsa id_rsa.pub known_hosts
【2.3】配置hadoop環境
hadoop系統版本 hadoop-1.2.1.tar.gz
解壓後linux目錄為 /home/longhui/hadoop/hadoop-1.2.1/
【2.3.1】配置 conf/hadoop-env.sh 配置JAVA_HOME對應的路徑
export JAVA_HOME=/usr/java/jdk1.6.0_45
【2.3.2】配置三個xml檔案
【1】core-site.xml配置
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://A22811459:9000</value>
</property>
</configuration>
【備註】
臨時資料夾為/tmp/hadoop,配置成功後該目錄下會生成兩個資料夾dfs mapred,並且/tmp目錄下會生成一些pid檔案
A22811459:/tmp # ls hadoop
hadoop/ hadoop-root-jobtracker.pid hadoop-root-secondarynamenode.pid
hadoop-root-datanode.pid hadoop-root-namenode.pid hadoop-root-tasktracker.pid
【2】hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/longhui/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/longhui/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
【備註】
配置成功後/home/longhui/hadoop/dfs/name下會生成一些檔案current image in_use.lock previous.checkpoint
/home/longhui/hadoop/dfs/data生成blocksBeingWritten current detach in_use.lock storage tmp
【3】mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>A22811459:9001</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/longhui/hadoop/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>/home/longhui/hadoop/mapred/system</value>
</property>
</configuration>
【4】由於主機名為A22811459,所以就不是localhost,並且/etc/hosts檔案中也要修改下
127.0.0.1 A22811459
【2.3.3】在/etc/profile中新增hadoop路徑並# source /etc/profile 生效
#set hadoop
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_HOME=/home/longhui/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
【2.3.4】格式化HDFS檔案系統
執行 bin/hadoop namenode -format 或直接hadoop namenode -format 接著輸入Y
# hadoop namenode -format
16/12/15 12:59:50 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = A22811459/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.6.0_45
************************************************************/
Re-format filesystem in /home/longhui/hadoop/dfs/name ? (Y or N) Y
16/12/15 12:59:52 INFO util.GSet: Computing capacity for map BlocksMap
16/12/15 12:59:52 INFO util.GSet: VM type = 32-bit
16/12/15 12:59:52 INFO util.GSet: 2.0% max memory = 932118528
16/12/15 12:59:52 INFO util.GSet: capacity = 2^22 = 4194304 entries
16/12/15 12:59:52 INFO util.GSet: recommended=4194304, actual=4194304
16/12/15 12:59:53 INFO namenode.FSNamesystem: fsOwner=root
16/12/15 12:59:53 INFO namenode.FSNamesystem: supergroup=supergroup
16/12/15 12:59:53 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/12/15 12:59:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/12/15 12:59:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/12/15 12:59:53 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/12/15 12:59:53 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/12/15 12:59:53 INFO common.Storage: Image file /home/longhui/hadoop/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/12/15 12:59:53 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/longhui/hadoop/dfs/name/current/edits
16/12/15 12:59:53 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/longhui/hadoop/dfs/name/current/edits
16/12/15 12:59:53 INFO common.Storage: Storage directory /home/longhui/hadoop/dfs/name has been successfully formatted.
16/12/15 12:59:53 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at A22811459/127.0.0.1
************************************************************/
【備註】如果警告Warning: $HADOOP_HOME is deprecated.
解決方法:在/etc/profie中新增一行,然後讓配置生效# source /etc/profile,再執行bin/hadoop namenode -format就不會報錯
export HADOOP_HOME_WARN_SUPPRESS=1
【2.3.5】啟動hadoop環境 注停止時stop-all.sh
# start-all.sh
starting namenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-A22811459.out
localhost: starting datanode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-A22811459.out
localhost: starting secondarynamenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-A22811459.out
starting jobtracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-A22811459.out
localhost: starting tasktracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-A22811459.out
【2.3.6】使用jps檢視叢集狀態,除jps程序外,另外五個程序缺一不可。如下說明正常啟動了
# jps
2352 TaskTracker
1940 DataNode
1802 NameNode
2465 Jps
2211 JobTracker
2106 SecondaryNameNode
【3】執行第一個自帶的測試用例:計算PI的值
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop jar hadoop-examples-1.2.1.jar pi 2 5
Number of Maps = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
16/12/15 14:06:04 INFO mapred.FileInputFormat: Total input paths to process : 2
16/12/15 14:06:04 INFO mapred.JobClient: Running job: job_201612151254_0001
16/12/15 14:06:05 INFO mapred.JobClient: map 0% reduce 0%
16/12/15 14:06:10 INFO mapred.JobClient: map 100% reduce 0%
16/12/15 14:06:18 INFO mapred.JobClient: map 100% reduce 33%
16/12/15 14:06:19 INFO mapred.JobClient: map 100% reduce 100%
16/12/15 14:06:19 INFO mapred.JobClient: Job complete: job_201612151254_0001
16/12/15 14:06:19 INFO mapred.JobClient: Counters: 30
16/12/15 14:06:19 INFO mapred.JobClient: Job Counters
16/12/15 14:06:19 INFO mapred.JobClient: Launched reduce tasks=1
16/12/15 14:06:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6864
16/12/15 14:06:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 14:06:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 14:06:19 INFO mapred.JobClient: Launched map tasks=2
16/12/15 14:06:19 INFO mapred.JobClient: Data-local map tasks=2
16/12/15 14:06:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8661
16/12/15 14:06:19 INFO mapred.JobClient: File Input Format Counters
16/12/15 14:06:19 INFO mapred.JobClient: Bytes Read=236
16/12/15 14:06:19 INFO mapred.JobClient: File Output Format Counters
16/12/15 14:06:19 INFO mapred.JobClient: Bytes Written=97
16/12/15 14:06:19 INFO mapred.JobClient: FileSystemCounters
16/12/15 14:06:19 INFO mapred.JobClient: FILE_BYTES_READ=50
16/12/15 14:06:19 INFO mapred.JobClient: HDFS_BYTES_READ=478
16/12/15 14:06:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=160889
16/12/15 14:06:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
16/12/15 14:06:19 INFO mapred.JobClient: Map-Reduce Framework
16/12/15 14:06:19 INFO mapred.JobClient: Map output materialized bytes=56
16/12/15 14:06:19 INFO mapred.JobClient: Map input records=2
16/12/15 14:06:19 INFO mapred.JobClient: Reduce shuffle bytes=56
16/12/15 14:06:19 INFO mapred.JobClient: Spilled Records=8
16/12/15 14:06:19 INFO mapred.JobClient: Map output bytes=36
16/12/15 14:06:19 INFO mapred.JobClient: Total committed heap usage (bytes)=377028608
16/12/15 14:06:19 INFO mapred.JobClient: CPU time spent (ms)=3100
16/12/15 14:06:19 INFO mapred.JobClient: Map input bytes=48
16/12/15 14:06:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=242
16/12/15 14:06:19 INFO mapred.JobClient: Combine input records=0
16/12/15 14:06:19 INFO mapred.JobClient: Reduce input records=4
16/12/15 14:06:19 INFO mapred.JobClient: Reduce input groups=4
16/12/15 14:06:19 INFO mapred.JobClient: Combine output records=0
16/12/15 14:06:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=376963072
16/12/15 14:06:19 INFO mapred.JobClient: Reduce output records=0
16/12/15 14:06:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1132392448
16/12/15 14:06:19 INFO mapred.JobClient: Map output records=4
Job Finished in 15.585 seconds
Estimated value of Pi is 3.60000000000000000000
【4】
【4.1】輸入伺服器IP:50070埠,這裡可以看到HDFS的管理情況。,可檢視如下html介面
http://10.17.35.xxx:50070/dfshealth.jsp
NameNode 'A22811459:9000'
Started: | Thu Dec 15 13:00:10 GMT+08:00 2016 |
Version: | 1.2.1, r1503152 |
Compiled: | Mon Jul 22 15:23:09 PDT 2013 by mattf |
Upgrades: | There are no upgrades in progress. |
Cluster Summary
11 files and directories, 13 blocks = 24 total. Heap Size is 57.69 MB / 888.94 MB (6%)Configured Capacity | : | 273 GB |
DFS Used | : | 40 KB |
Non DFS Used | : | 260.77 GB |
DFS Remaining | : | 12.23 GB |
DFS Used% | : | 0 % |
DFS Remaining% | : | 4.48 % |
Number of Under-Replicated Blocks | : | 0 |
NameNode Storage:
Storage Directory | Type | State |
/home/longhui/hadoop/dfs/name | IMAGE_AND_EDITS | Active |
This is Apache Hadoop release 1.2.1
【4.2】50030埠可以看到Map/Reduce的管理情況
A22811459 Hadoop Map/Reduce Administration
State: RUNNINGStarted: Thu Dec 15 12:54:23 GMT+08:00 2016
Version: 1.2.1, r1503152
Compiled: Mon Jul 22 15:23:09 PDT 2013 by mattf
Identifier: 201612151254
SafeMode: OFF
Cluster Summary (Heap Size is 51.56 MB/888.94 MB)
Running Map Tasks | Running Reduce Tasks | Total Submissions | Nodes | Occupied Map Slots | Occupied Reduce Slots | Reserved Map Slots | Reserved Reduce Slots | Map Task Capacity | Reduce Task Capacity | Avg. Tasks/Node | Blacklisted Nodes | Graylisted Nodes | Excluded Nodes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 2 | 4.00 | 0 | 0 | 0 |
Scheduling Information
Queue Name | State | Scheduling Information |
Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields
Running Jobs
Completed Jobs
Jobid | Started | Priority | User | Name | Map % Complete | Map Total | Maps Completed | Reduce % Complete | Reduce Total | Reduces Completed | Job Scheduling Information | Diagnostic Info |
Thu Dec 15 14:06:04 GMT+08:00 2016 | NORMAL | root | PiEstimator | 100.00% | 2 | 2 | 100.00% | 1 | 1 | NA | NA |
Retired Jobs
none |