官網目前提供的下載包為32位系統的安裝包,在linux 64位系統下安裝後會一直提示錯誤“WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  ”,但官網又不提供64位系統下的安裝包,所以你只能自己去編譯打包64位系統下的安裝包。

        如何檢視自己的Hadoop是32位還是64位呢,這裡我的Hadoop安裝在/opt/hadoop-2.7.1/,那麼在/opt/hadoop-2.7.1/lib/native目錄下,可以檢視檔案libhadoop.so.1.0.0,裡面會顯示Hadoop的位數,這裡我的是已經自己編譯了Hadoop,所以是64位的,截圖如下:


 

2.編譯hadoop2.7.1的正確方法(官網)

        網上關於Hadoop編譯的文章一大堆,編譯前準備工作五花八門,很少有人告訴你為什麼這麼做,初學者只能被動接受整個過程

        當你的作業系統是64位linux,但在官網下載的hadoop2.7.1是32位的時候,你就得考慮自己編譯打包,獲得hadoop2.7.1在64位作業系統下的安裝包了,問題是網上編譯hadoop2.7.1一大堆,為什麼要這麼做你知道嗎?別人為什麼要這麼做呢?

        當你遇到上述問題的時候,最可靠的還是官網對於編譯的說明,這個說明在hadoop2.7.1的原始碼根目錄下的BUILDING.txt檔案裡面,這裡我下載的hadoop2.7.1原始碼根目錄在/opt/hadoop-2.7.1-src/,截圖如下:


        這裡將BUILDING.txt重要內容分析如下:

        1)編譯前必備條件

來自BUILDING.txt.
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
* Jansson C XML parsing library ( if compiling libwebhdfs )
* Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

        2)hadoop maven模組介紹

  - hadoop-project           (Parent POM for all Hadoop Maven modules.All plugins & dependencies versions are defined here.)

  - hadoop-project-dist      (Parent POM for modules that generate distributions.)

  - hadoop-annotations       (Generates the Hadoop doclet used to generated the Javadocs)

  - hadoop-assemblies        (Maven assemblies used by the different modules)

  - hadoop-common-project    (Hadoop Common)

  - hadoop-hdfs-project      (Hadoop HDFS)

  - hadoop-mapreduce-project (Hadoop MapReduce)

  - hadoop-tools             (Hadoop tools like Streaming, Distcp, etc.)

  - hadoop-dist              (Hadoop distribution assembler)

        3)maven工程從哪裡開始編譯

來自BUILDING.txt
Where to run Maven from?
It can be run from any module. The only catch is that if not run from utrunk
all modules that are not part of the build run must be installed in the local
Maven cache or available in a Maven repository.

        可以編譯單個模組,可以在主模組下編譯所有模組,唯一不同是,編譯單個模組只會將變異的jar包放置於maven本地資源庫中,在主模組下編譯也會將各模組編譯放置於maven本地資源庫中,還會打包Hadoop針對該機的tar.gz安裝包。

        4)關於snappy

來自BUILDING.txt
Snappy build options:
Snappy is a compression library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.

* Use -Drequire.snappy to fail the build if libsnappy.so is not found.
If this option is not specified and the snappy library is missing,
we silently build a version of libhadoop.so that cannot make use of snappy.
This option is recommended if you plan on making use of snappy and want
to get more repeatable builds.

* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
header files and library files. You do not need this option if you have
installed snappy using a package manager.
* Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library
files. Similarly to snappy.prefix, you do not need this option if you have
installed snappy using a package manager.
* Use -Dbundle.snappy to copy the contents of the snappy.lib directory into
the final tar file. This option requires that -Dsnappy.lib is also given,
and it ignores the -Dsnappy.prefix option.

        Hadoop支援用特定的壓縮演算法將要儲存的檔案進行壓縮,在客戶端訪問時,又自動解壓縮返回給客戶端原始格式檔案,目前Hadoop支援的壓縮格式有LZO、SNAPPY等,這裡SNAPPY預設是不支援的,如果要使得Hadoop支援SNAPPY,需要首先安裝linux關於SNAPPY庫,然後編譯Hadoop得到安裝包

        目前市面上普遍採用的壓縮方式為SNAPPY,SNAPPY也是後期分散式列儲存資料庫HBASE的首選,而hbase必須依賴Hadoop環境,所以如果後期採用hbase又想用壓縮SNAPPY的話,這裡將SNAPPY一起編譯進來是有必要的。

        5)編譯方式選擇

來自BUILDING.txt
----------------------------------------------------------------------------------
Building distributions:
Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar
Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
Create source and binary distributions with native code and documentation:
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------

         大致意思如下:


     

        6)Hadoop單機和叢集安裝方式介紹

來自BUILDING.txt
----------------------------------------------------------------------------------
Installing Hadoop
Look for these HTML files after you build the document by the above commands.
* Single Node Setup:
hadoop-project-dist/hadoop-common/SingleCluster.html
* Cluster Setup:
hadoop-project-dist/hadoop-common/ClusterSetup.html
----------------------------------------------------------------------------------

        7)maven編譯Hadoop時候記憶體設定項

來自BUILDING.txt
----------------------------------------------------------------------------------
If the build process fails with an out of memory error, you should be able to fix
it by increasing the memory used by maven -which can be done via the environment
variable MAVEN_OPTS.
Here is an example setting to allocate between 256 and 512 MB of heap space to
Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"
----------------------------------------------------------------------------------

         大致意思是,如果maven編譯遇到記憶體方面錯誤,請先設定maven記憶體配置,例如linux下請設定export MAVEN_OPTS="-Xms256m -Xmx512m",這點對於後期編譯spark原始碼也一樣好使

3.Hadoop編譯必備條件準備

        1)Unix System(作業系統為linux,作業系統請自行安裝,沒條件的就弄虛擬機器)

        2)JDK 1.7+

#1.首先不建議用openjdk,建議採用oracle官網JDK

#2.首先解除安裝系統自帶的低版本或者自帶openjdk
#首先用命令java -version 檢視系統中原有的java版本
#然後用用 rpm -qa | gcj 命令檢視具體的資訊
#最後用 rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64解除安裝

#3.安裝jdk-7u65-linux-x64.gz
#下載jdk-7u65-linux-x64.gz放置於/opt/java/jdk-7u65-linux-x64.gz並解壓
cd /opt/java/
tar -zxvf jdk-7u65-linux-x64.gz
#配置linux系統環境變數
vi /etc/profile
#在檔案末尾追加如下內容
export JAVA_HOME=/opt/java/jdk1.7.0_65
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#使配置生效
source /etc/profile

#4檢查JDK環境是否配置成功
java -version

        3)Maven 3.0 or later

#1.下載apache-maven-3.3.3.tar.gz放置於/opt/下並解壓
cd /opt
tar zxvf apache-maven-3.3.3.tar.gz

#2.配置環境變數
vi /etc/profile
#新增如下內容
MAVEN_HOME=/opt/apache-maven-3.3.3
export MAVEN_HOME
export PATH=${PATH}:${MAVEN_HOME}/bin

#3.使配置生效
source /etc/profile

#4.檢測maven是否安裝成功
mvn -version

#5.配置maven中央倉庫,maven預設是從中央倉庫去下載依賴的jar和外掛,中央倉庫在過完,對於國內,
有其他中央倉庫映象可供下載,這裡我設定maven在國內映象倉庫oschina
#maven不懂的話先去了解下,最好不要修改/opt/apache-maven-3.3.3/conf/settings.xml配置檔案,因為該檔案對所有
使用者生效,而是修改當前使用者所在根目錄,比如對於hadoop使用者,你要修改的檔案是/home/hadoop/.m2/settings.xml配置檔案,
#在該檔案中新增如下內容
Xml程式碼  收藏程式碼
  1. <?xmlversion="1.0"encoding="UTF-8"?>
  2. <settingsxmlns="http://maven.apache.org/SETTINGS/1.0.0"
  3.           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4.           xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  5.   <!--預設下載依賴放置於/home/hadoop/.m2/repository目錄下,這裡我指定我想存放於/opt/maven-localRepository-->
  6.   <localRepository>/opt/maven-localRepository</localRepository>
  7.   <pluginGroups></pluginGroups>
  8.   <proxies></proxies>
  9.   <servers></servers>
  10.   <mirrors>
  11.   <!--add by aperise start-->
  12.     <mirror>
  13.         <id>nexus-osc</id>
  14.         <mirrorOf>*</mirrorOf>
  15.         <name>Nexus osc</name>
  16.         <url>http://maven.oschina.net/content/groups/public/</url>
  17.     </mirror>
  18.   <!--add by aperise end-->
  19.   </mirrors>
  20.   <profiles>
  21.   <!--add by aperise start-->
  22.         <profile>
  23.             <id>jdk-1.7</id>
  24.             <activation>
  25.                 <jdk>1.7</jdk>
  26.             </activation>
  27.             <repositories>
  28.                 <repository>
  29.                     <id>nexus</id>
  30.                     <name>local private nexus</name>
  31.                     <url>http://maven.oschina.net/content/groups/public/</url>
  32.                     <releases>
  33.                         <enabled>true</enabled>
  34.                     </releases>
  35.                     <snapshots>
  36.                         <enabled>false</enabled>
  37.                     </snapshots>
  38.                 </repository>
  39.             </repositories>
  40.             <pluginRepositories>
  41.                 <pluginRepository>
  42.                     <id>nexus</id>
  43.                     <name>local private nexus</name>
  44.                     <url>http://maven.oschina.net/content/groups/public/</url>
  45.                     <releases>
  46.                         <enabled>true</enabled>
  47.                     </releases>
  48.                     <snapshots>
  49.                         <enabled>false</enabled>
  50.                     </snapshots>
  51.                 </pluginRepository>
  52.             </pluginRepositories>
  53.         </profile>
  54.   <!--add by aperise end-->
  55.   </profiles>
  56. </settings>

        4)Findbugs 3.0.1 (if running findbugs)

#1.安裝
tar zxvf findbugs-3.0.1.tar.gz
#2.配置環境變數
vi /etc/profile
#內容如下:
export FINDBUGS_HOME=/opt/findbugs-3.0.1
export PATH=$PATH:$FINDBUGS_HOME/bin
#3.使配置生效
source /etc/profile
#4.鍵入findbugs檢測是否安裝成功
findbugs

        5)ProtocolBuffer 2.5.0

#1.安裝(需要先安裝cmake,條件六需要先做)
tar zxvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure --prefix=/usr/local/protobuf
make
make check
make install
#2.配置環境變數
vi /etc/profile
#編輯內容如下:
export PATH=$PATH:/usr/local/protobuf/bin
export PKG_CONFIG_PATH=/usr/local/protobuf/lib/pkgconfig/
#3.使配置生效,輸入命令,source /etc/profile
#4.鍵入protoc --version檢測是否安裝成功
protoc --version

         6)CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac

#1.安裝前提
yum install gcc-c++
yum install ncurses-devel
#2.安裝
#方法一是直接yum install cmake
#方法二下載tar.gz編譯安裝
#下載cmake-3.3.2.tar.gz編譯並安裝
tar -zxv -f cmake-3.3.2.tar.gz
cd cmake-3.3.2
./bootstrap
make
make install
#3.鍵入cmake檢測是否安裝成功
cmake

        7)Zlib devel (if compiling native code)

yum -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev zlib-devel

        8)openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )

yum install openssl-devel

        9)Jansson C XML parsing library ( if compiling libwebhdfs )

        10)Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )

        11Internet connection for first build (to fetch all Maven and Hadoop dependencies)

        上面三個看自己情況選擇,不是必須的。

        在我搭建環境過程中,安裝的東西很多,我個人當時還執行過如下命令安裝一些缺失的庫,命令如下:

yum -y install build-essential autoconf automake libtool zlib1g-dev pkg-config libssl-dev

4.正式編譯Hadoop原始碼


        進入原始碼根目錄執行/opt/hadoop-2.7.1-src

cd /opt/hadoop-2.7.1-src
export MAVEN_OPTS="-Xms256m -Xmx512m"
mvn package -Pdist,native,docs -DskipTests -Dtar

        原始碼編譯需要半小時以上,會從公網去下載原始碼依賴的jar包,所以請耐心等待編譯完成,編譯完成後提示資訊如下:

Java程式碼  收藏程式碼
  1. [INFO] ------------------------------------------------------------------------  
  2. [INFO] Reactor Summary:  
  3. [INFO]  
  4. [INFO] Apache Hadoop Common ............................... SUCCESS [02:27 min]  
  5. [INFO] Apache Hadoop NFS .................................. SUCCESS [  4.841 s]  
  6. [INFO] Apache Hadoop KMS .................................. SUCCESS [ 15.176 s]  
  7. [INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.055 s]  
  8. [INFO] Apache Hadoop HDFS ................................. SUCCESS [03:36 min]  
  9. [INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 21.601 s]  
  10. [INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [  4.182 s]  
  11. [INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  3.577 s]  
  12. [INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.036 s]  
  13. [INFO] hadoop-yarn ........................................ SUCCESS [  0.033 s]  
  14. [INFO] hadoop-yarn-api .................................... SUCCESS [01:53 min]  
  15. [INFO] hadoop-yarn-common ................................. SUCCESS [ 23.525 s]  
  16. [INFO] hadoop-yarn-server ................................. SUCCESS [  0.042 s]  
  17. [INFO] hadoop-yarn-server-common .......................... SUCCESS [  8.896 s]  
  18. [INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 11.562 s]  
  19. [INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  3.324 s]  
  20. [INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [  6.115 s]  
  21. [INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 14.149 s]  
  22. [INFO] hadoop-yarn-server-tests ........................... SUCCESS [  3.887 s]  
  23. [INFO] hadoop-yarn-client ................................. SUCCESS [  5.333 s]  
  24. [INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [  2.249 s]  
  25. [INFO] hadoop-yarn-applications ........................... SUCCESS [  0.032 s]  
  26. [INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  1.915 s]  
  27. [INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  1.450 s]  
  28. [INFO] hadoop-yarn-site ................................... SUCCESS [  0.049 s]  
  29. [INFO] hadoop-yarn-registry ............................... SUCCESS [  4.165 s]  
  30. [INFO] hadoop-yarn-project ................................ SUCCESS [  4.168 s]  
  31. [INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.077 s]  
  32. [INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 15.869 s]  
  33. [INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 15.401 s]  
  34. [INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  2.696 s]  
  35. [INFO] hadoop-mapreduce-client-app ........................ SUCCESS [  5.780 s]  
  36. [INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  4.528 s]  
  37. [INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [  3.592 s]  
  38. [INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  1.262 s]  
  39. [INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  3.969 s]  
  40. [INFO] hadoop-mapreduce ................................... SUCCESS [  3.829 s]  
  41. [INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  2.999 s]  
  42. [INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  7.995 s]  
  43. [INFO] Apache Hadoop Archives ............................. SUCCESS [  1.425 s]  
  44. [INFO] Apache Hadoop Rumen ................................ SUCCESS [  4.508 s]  
  45. [INFO] Apache Hadoop Gridmix .............................. SUCCESS [  3.023 s]  
  46. [INFO] Apache Hadoop Data Join ............................ SUCCESS [  1.896 s]  
  47. [INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.633 s]  
  48. [INFO] Apache Hadoop Extras ............................... SUCCESS [  2.256 s]  
  49. [INFO] Apache Hadoop Pipes ................................ SUCCESS [  1.738 s]  
  50. [INFO] Apache Hadoop OpenStack support .................... SUCCESS [  3.198 s]  
  51. [INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [  8.421 s]  
  52. [INFO] Apache Hadoop Azure support ........................ SUCCESS [  2.808 s]  
  53. [INFO] Apache Hadoop Client ............................... SUCCESS [ 10.124 s]  
  54. [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.097 s]  
  55. [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  3.395 s]  
  56. [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 10.150 s]  
  57. [INFO] Apache Hadoop Tools ................................ SUCCESS [  0.035 s]  
  58. [INFO] Apache Hadoop Distribution ......................... SUCCESS [01:48 min]  
  59. [INFO] ------------------------------------------------------------------------  
  60. [INFO] BUILD SUCCESS  
  61. [INFO] ------------------------------------------------------------------------  
  62. [INFO] Total time: 14:12 min  
  63. [INFO] Finished at: 2015-11-04T16:08:14+08:00
  64. [INFO] Final Memory: 139M/1077M  
  65. [INFO] ------------------------------------------------------------------------  

        編譯後獲得Hadoop安裝包位置如下:

        cd /opt/hadoop-2.7.1-src/hadoop-dist/target

        ls

        antrun                    hadoop-2.7.1.tar.gz            maven-archiver

        dist-layout-stitching.sh  hadoop-dist-2.7.1.jar          test-dir

        dist-tar-stitching.sh     hadoop-dist-2.7.1-javadoc.jar

        hadoop-2.7.1              javadoc-bundle-options

5.Hadoop編譯環境相關依賴包分享