1. 程式人生 > >《深入理解大資料-大資料處理與編輯實踐》hadoop1.2.1安裝

《深入理解大資料-大資料處理與編輯實踐》hadoop1.2.1安裝

【第一部分】《深入理解大資料》一書的原始碼 

【第二部分】安裝hadoop1.2.1安裝

【1】安裝java程式
jdk-6u45-linux-i586-rpm.rar 解壓後為 jdk-6u45-linux-i586-rpm.bin
安裝執行 ./jdk-6u45-linux-i586-rpm.bin
安裝成功後目錄為 /usr/java/jdk1.6.0_45
A22811459:/usr/java/jdk1.6.0_45 # pwd
/usr/java/jdk1.6.0_45
A22811459:/usr/java/jdk1.6.0_45 # ls
COPYRIGHT  LICENSE  README.html  THIRDPARTYLICENSEREADME.txt  bin  include  jre  lib  man  src.zip

【1.2】在系統中/etc/profile新增java路徑,便於呼叫
#set java
export JAVA_HOME=/usr/java/jdk1.6.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

【1.3】讓配置生效
# source /etc/profile

【1.4】檢視java版本,說明安裝成功
A22811459:/usr/java/jdk1.6.0_45 # java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode

【1.5】可以寫一個簡單的java程式進行編譯執行,進一步確保java安裝成功
HelloWel.java

public class HelloWel {
       public static void main(String[] args)
       {
          System.out.println("JAVA OK");    
       }    
}

編譯和執行
# javac HelloWel.java
# java HelloWel
JAVA OK
至此可百分百確保Java安裝沒有問題,java路徑(後面會用到)為 /usr/java/jdk1.6.0_45


【2】hadoop1.2.1安裝 參考《深入理解大資料》
【2.1】建立hadoop使用者
#groupadd hadoop-user
#useradd -g hadoop-user hadoop
#passwd hadoop

【2.2】配置SSH
#ssh-keygen -t rsa
# cd /root/.ssh/
#cp id_rsa.pub authorized_keys
#ssh localhost
檢視結果
# ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts

【2.3】配置hadoop環境
hadoop系統版本 hadoop-1.2.1.tar.gz
解壓後linux目錄為 /home/longhui/hadoop/hadoop-1.2.1/

【2.3.1】配置 conf/hadoop-env.sh 配置JAVA_HOME對應的路徑
export JAVA_HOME=/usr/java/jdk1.6.0_45

【2.3.2】配置三個xml檔案
【1】core-site.xml配置
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://A22811459:9000</value>
</property>
</configuration>
【備註】
臨時資料夾為/tmp/hadoop,配置成功後該目錄下會生成兩個資料夾dfs  mapred,並且/tmp目錄下會生成一些pid檔案
A22811459:/tmp # ls hadoop
hadoop/                            hadoop-root-jobtracker.pid         hadoop-root-secondarynamenode.pid
hadoop-root-datanode.pid           hadoop-root-namenode.pid           hadoop-root-tasktracker.pid
【2】hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/longhui/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/longhui/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
【備註】
配置成功後/home/longhui/hadoop/dfs/name下會生成一些檔案current  image  in_use.lock  previous.checkpoint
/home/longhui/hadoop/dfs/data生成blocksBeingWritten  current  detach  in_use.lock  storage  tmp
【3】mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>A22811459:9001</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/longhui/hadoop/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>/home/longhui/hadoop/mapred/system</value>
</property>
</configuration>
【4】由於主機名為A22811459,所以就不是localhost,並且/etc/hosts檔案中也要修改下
127.0.0.1       A22811459


【2.3.3】在/etc/profile中新增hadoop路徑並# source /etc/profile 生效
#set hadoop
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_HOME=/home/longhui/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

【2.3.4】格式化HDFS檔案系統
執行 bin/hadoop namenode -format 或直接hadoop namenode -format 接著輸入Y
# hadoop namenode -format
16/12/15 12:59:50 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = A22811459/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.6.0_45
************************************************************/
Re-format filesystem in /home/longhui/hadoop/dfs/name ? (Y or N) Y
16/12/15 12:59:52 INFO util.GSet: Computing capacity for map BlocksMap
16/12/15 12:59:52 INFO util.GSet: VM type       = 32-bit
16/12/15 12:59:52 INFO util.GSet: 2.0% max memory = 932118528
16/12/15 12:59:52 INFO util.GSet: capacity      = 2^22 = 4194304 entries
16/12/15 12:59:52 INFO util.GSet: recommended=4194304, actual=4194304
16/12/15 12:59:53 INFO namenode.FSNamesystem: fsOwner=root
16/12/15 12:59:53 INFO namenode.FSNamesystem: supergroup=supergroup
16/12/15 12:59:53 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/12/15 12:59:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/12/15 12:59:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/12/15 12:59:53 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/12/15 12:59:53 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/12/15 12:59:53 INFO common.Storage: Image file /home/longhui/hadoop/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/12/15 12:59:53 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/longhui/hadoop/dfs/name/current/edits
16/12/15 12:59:53 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/longhui/hadoop/dfs/name/current/edits
16/12/15 12:59:53 INFO common.Storage: Storage directory /home/longhui/hadoop/dfs/name has been successfully formatted.
16/12/15 12:59:53 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at A22811459/127.0.0.1
************************************************************/
【備註】如果警告Warning: $HADOOP_HOME is deprecated. 
解決方法:在/etc/profie中新增一行,然後讓配置生效# source /etc/profile,再執行bin/hadoop namenode -format就不會報錯
export HADOOP_HOME_WARN_SUPPRESS=1
【2.3.5】啟動hadoop環境  注停止時stop-all.sh
# start-all.sh
starting namenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-A22811459.out
localhost: starting datanode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-A22811459.out
localhost: starting secondarynamenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-A22811459.out
starting jobtracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-A22811459.out
localhost: starting tasktracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-A22811459.out

【2.3.6】使用jps檢視叢集狀態,除jps程序外,另外五個程序缺一不可。如下說明正常啟動了
# jps
2352 TaskTracker
1940 DataNode
1802 NameNode
2465 Jps
2211 JobTracker
2106 SecondaryNameNode

【3】執行第一個自帶的測試用例:計算PI的值
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop jar hadoop-examples-1.2.1.jar pi 2 5
Number of Maps  = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
16/12/15 14:06:04 INFO mapred.FileInputFormat: Total input paths to process : 2
16/12/15 14:06:04 INFO mapred.JobClient: Running job: job_201612151254_0001
16/12/15 14:06:05 INFO mapred.JobClient:  map 0% reduce 0%
16/12/15 14:06:10 INFO mapred.JobClient:  map 100% reduce 0%
16/12/15 14:06:18 INFO mapred.JobClient:  map 100% reduce 33%
16/12/15 14:06:19 INFO mapred.JobClient:  map 100% reduce 100%
16/12/15 14:06:19 INFO mapred.JobClient: Job complete: job_201612151254_0001
16/12/15 14:06:19 INFO mapred.JobClient: Counters: 30
16/12/15 14:06:19 INFO mapred.JobClient:   Job Counters
16/12/15 14:06:19 INFO mapred.JobClient:     Launched reduce tasks=1
16/12/15 14:06:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6864
16/12/15 14:06:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 14:06:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 14:06:19 INFO mapred.JobClient:     Launched map tasks=2
16/12/15 14:06:19 INFO mapred.JobClient:     Data-local map tasks=2
16/12/15 14:06:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8661
16/12/15 14:06:19 INFO mapred.JobClient:   File Input Format Counters
16/12/15 14:06:19 INFO mapred.JobClient:     Bytes Read=236
16/12/15 14:06:19 INFO mapred.JobClient:   File Output Format Counters
16/12/15 14:06:19 INFO mapred.JobClient:     Bytes Written=97
16/12/15 14:06:19 INFO mapred.JobClient:   FileSystemCounters
16/12/15 14:06:19 INFO mapred.JobClient:     FILE_BYTES_READ=50
16/12/15 14:06:19 INFO mapred.JobClient:     HDFS_BYTES_READ=478
16/12/15 14:06:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=160889
16/12/15 14:06:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
16/12/15 14:06:19 INFO mapred.JobClient:   Map-Reduce Framework
16/12/15 14:06:19 INFO mapred.JobClient:     Map output materialized bytes=56
16/12/15 14:06:19 INFO mapred.JobClient:     Map input records=2
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce shuffle bytes=56
16/12/15 14:06:19 INFO mapred.JobClient:     Spilled Records=8
16/12/15 14:06:19 INFO mapred.JobClient:     Map output bytes=36
16/12/15 14:06:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=377028608
16/12/15 14:06:19 INFO mapred.JobClient:     CPU time spent (ms)=3100
16/12/15 14:06:19 INFO mapred.JobClient:     Map input bytes=48
16/12/15 14:06:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=242
16/12/15 14:06:19 INFO mapred.JobClient:     Combine input records=0
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce input records=4
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce input groups=4
16/12/15 14:06:19 INFO mapred.JobClient:     Combine output records=0
16/12/15 14:06:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=376963072
16/12/15 14:06:19 INFO mapred.JobClient:     Reduce output records=0
16/12/15 14:06:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1132392448
16/12/15 14:06:19 INFO mapred.JobClient:     Map output records=4
Job Finished in 15.585 seconds
Estimated value of Pi is 3.60000000000000000000

【4】

【4.1】輸入伺服器IP:50070埠,這裡可以看到HDFS的管理情況。,可檢視如下html介面
http://10.17.35.xxx:50070/dfshealth.jsp

NameNode 'A22811459:9000'

Started: Thu Dec 15 13:00:10 GMT+08:00 2016
Version: 1.2.1, r1503152
Compiled: Mon Jul 22 15:23:09 PDT 2013 by mattf
Upgrades: There are no upgrades in progress.

Cluster Summary

11 files and directories, 13 blocks = 24 total. Heap Size is 57.69 MB / 888.94 MB (6%)
Configured Capacity : 273 GB
DFS Used : 40 KB
Non DFS Used : 260.77 GB
DFS Remaining : 12.23 GB
DFS Used% : 0 %
DFS Remaining% : 4.48 %
Number of Under-Replicated Blocks : 0

NameNode Storage:

Storage Directory Type State
/home/longhui/hadoop/dfs/name IMAGE_AND_EDITS Active

This is Apache Hadoop release 1.2.1

【4.2】50030埠可以看到Map/Reduce的管理情況

A22811459 Hadoop Map/Reduce Administration

State: RUNNING
Started: Thu Dec 15 12:54:23 GMT+08:00 2016
Version: 1.2.1, r1503152
Compiled: Mon Jul 22 15:23:09 PDT 2013 by mattf
Identifier: 201612151254
SafeMode: OFF

Cluster Summary (Heap Size is 51.56 MB/888.94 MB)

Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
0 0 1 1 0 0 0 0 2 2 4.00 0 0 0

Scheduling Information

Queue Name State Scheduling Information
Filter (Jobid, Priority, User, Name)
Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields

Running Jobs

Completed Jobs

Jobid Started Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
Thu Dec 15 14:06:04 GMT+08:00 2016 NORMAL root PiEstimator 100.00% 2 2 100.00% 1 1 NA NA

Retired Jobs

none

Local Logs

Log directory, Job Tracker History This is Apache Hadoop release 1.2.1