1. 程式人生 > >Windows10下配置大資料開發環境(1)

Windows10下配置大資料開發環境(1)

一 準備工作

如果安裝的Hadoop是3.1.1版本,應選擇jdk1.8以上版本

3. 搜尋HADOOP在Windows環境需要的工具

二 安裝JDK

安裝JDK後,需要配置JDK的環境,這裡可以自行百度。

三 安裝和配置Hapood環境

將Hadoop安裝到自己想裝的目錄,目錄路徑應避免有空格、中文、特殊字元。我這裡安裝到D:\Hadoop-3.1.1

配置系統環境變數,在PATH中增加hadoop:

先備份D:\hadoop-3.1.1\etc目錄,以備出現錯誤後跟原始檔案的對比分析。

建立目錄:

D:\hadoop-3.1.1\workplace\data

D:\hadoop-3.1.1\workplace\tmp

修改D:\hadoop-3.1.1\etc\hadoop\core-site.xml

因為一般我們hadoop的執行目錄在D盤,所以下面不用寫出/D:,用/hadoop-3.1.1會自動找到D盤根目錄的Hadoop路徑。

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop-3.1.1/workplace/tmp</value>
    </property>    
    <property>
        <name>dfs.name.dir</name>
        <value>/hadoop-3.1.1/workplace/name</value>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9500</value>
    </property>
</configuration>

修改D:\hadoop-3.1.1\etc\hadoop\mapred-site.xml檔案

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>    
    <property>       
        <name>mapred.job.tracker</name>
        <value>hdfs://localhost:9501</value>
    </property>
</configuration>

修改D:\hadoop-3.1.1\etc\hadoop\hdfs-site.xml

<configuration>
    <!-- 這個引數設定為1,因為是單機版hadoop -->    
    <property>        
        <name>dfs.replication</name>        
        <value>1</value>    
    </property>    
    <property>        
        <name>dfs.data.dir</name>        
        <value>/hadoop-3.1.1/workplace/data</value>    
    </property>
    <property> 
        <name>dfs.webhdfs.enabled</name> 
        <value>true</value> 
    </property> 
    <property>
        <name>dfs.http.address</name>
        <value>localhost:50070</value>
    </property>
</configuration>

修改D:\hadoop-3.1.1\etc\hadoop\yarn-site.xml

<configuration>
    <!-- Site specific YARN configuration properties -->
    <property>       
        <name>yarn.nodemanager.aux-services</name>       
        <value>mapreduce_shuffle</value>    
    </property>    
    <property>       
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>       
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>    
    </property>
    <property>
        <!-- NodeManager總的可用虛擬CPU個數 -->
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
    </property>
    <property>
        <!-- 每個節點可用的最大記憶體 -->
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>
    <property>
        <!-- 中間結果存放位置 -->
        <name>yarn.nodemanager.local-dirs</name>
        <value>/hadoop-3.1.1/workplace/tmp/nm-local-dir</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/hadoop-3.1.1/logs/yarn</value>
    </property>
</configuration>

修改D:\hadoop-3.1.1\etc\hadoop\hadoop-env.cmd

@set JAVA_HOME=%JAVA_HOME%

set JAVA_HOME=C:\Java\jdk1.8.0_172

四 複製winutils

先將D:\hadoop-3.1.1\bin進行備份

再將apache-hadoop-3.1.1-winutils-master.zip解壓後複製到Hadoop相應資料夾

五 格式化hdfs檔案系統

執行D:\hadoop-3.1.1\bin\hdfs.cmd namenode -format

hdfs還有許多其他引數的命令,具體可以直接執行hdfs.cmd檢視,或者在官網檢視。

六 啟動hadoop

執行‪D:\hadoop-3.1.1\sbin\start-all.cmd,回啟動4個視窗,需要仔細觀察4個視窗的啟動日誌,看是否有任意異常或錯誤報出。

如果一切正常,可以訪問如下服務:

常見問題:

1. 啟動Hadoop報錯,提示datanode卷錯誤:

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
2018-09-30 10:45:10,306 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = DESKTOP-ORMM49N/10.200.130.178
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 3.1.1
STARTUP_MSG:   classpath = C:\hadoop-3.1.1\etc\hadoop;C:\hadoop-3.1.1\share\hadoop\common;
... ...
C:\hadoop-3.1.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.1.1.jar
STARTUP_MSG:   build = https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c; compiled by 'leftnoteasy' on 2018-08-02T04:26Z
STARTUP_MSG:   java = 1.8.0_172
************************************************************/
2018-09-30 10:45:11,277 INFO checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/C:/hadoop-3.1.1/workplace/data
2018-09-30 10:45:11,312 WARN checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/C:/hadoop-3.1.1/workplace/data
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:455)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:796)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:710)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:678)
        at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233)
        at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116)
        at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:239)
        at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:52)
        at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:142)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-09-30 10:45:11,314 ERROR datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
        at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2018-09-30 10:45:11,318 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2018-09-30 10:45:11,321 INFO datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at DESKTOP-ORMM49N/10.200.130.178
************************************************************/

C:\hadoop-3.1.1\sbin>

解決方案:刪除提前手動建立的data資料夾,讓hadoop自己建立。

2. 啟動yarn nodemanager失敗,問題原因是winutil資料夾需要跟hadoop版本3.1.1一致的原始碼編譯的包

解決方案:到GITHUB找到跟Hadoop同一版本的winutil檔案

3. 無法訪問localhost:50070

修改hdfs-site.xml,增加如下配置後,問題解決。

    <property>
        <name>dfs.http.address</name>
        <value>localhost:50070</value>
    </property>

但這時根本原因沒找到。於是註釋掉上面的配置後,再次啟動,檢視namenode的日誌發下下面有意思的情況:

想起自己在host檔案裡配置了某域名的IP對映:??????.??????.com 0.0.0.0,原來是host配置影響了這裡預設site啟動地址。