Windows10下配置大資料開發環境(1)
一 準備工作
如果安裝的Hadoop是3.1.1版本,應選擇jdk1.8以上版本
3. 搜尋HADOOP在Windows環境需要的工具
二 安裝JDK
安裝JDK後,需要配置JDK的環境,這裡可以自行百度。
三 安裝和配置Hapood環境
將Hadoop安裝到自己想裝的目錄,目錄路徑應避免有空格、中文、特殊字元。我這裡安裝到D:\Hadoop-3.1.1
配置系統環境變數,在PATH中增加hadoop:
先備份D:\hadoop-3.1.1\etc目錄,以備出現錯誤後跟原始檔案的對比分析。
建立目錄:
D:\hadoop-3.1.1\workplace\data
D:\hadoop-3.1.1\workplace\tmp
修改D:\hadoop-3.1.1\etc\hadoop\core-site.xml
因為一般我們hadoop的執行目錄在D盤,所以下面不用寫出/D:,用/hadoop-3.1.1會自動找到D盤根目錄的Hadoop路徑。
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/hadoop-3.1.1/workplace/tmp</value> </property> <property> <name>dfs.name.dir</name> <value>/hadoop-3.1.1/workplace/name</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9500</value> </property> </configuration>
修改D:\hadoop-3.1.1\etc\hadoop\mapred-site.xml檔案
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:9501</value> </property> </configuration>
修改D:\hadoop-3.1.1\etc\hadoop\hdfs-site.xml
<configuration>
<!-- 這個引數設定為1,因為是單機版hadoop -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop-3.1.1/workplace/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
</configuration>
修改D:\hadoop-3.1.1\etc\hadoop\yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<!-- NodeManager總的可用虛擬CPU個數 -->
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<!-- 每個節點可用的最大記憶體 -->
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<!-- 中間結果存放位置 -->
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop-3.1.1/workplace/tmp/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/hadoop-3.1.1/logs/yarn</value>
</property>
</configuration>
修改D:\hadoop-3.1.1\etc\hadoop\hadoop-env.cmd
@set JAVA_HOME=%JAVA_HOME%
set JAVA_HOME=C:\Java\jdk1.8.0_172
四 複製winutils
先將D:\hadoop-3.1.1\bin進行備份
再將apache-hadoop-3.1.1-winutils-master.zip解壓後複製到Hadoop相應資料夾
五 格式化hdfs檔案系統
執行D:\hadoop-3.1.1\bin\hdfs.cmd namenode -format
hdfs還有許多其他引數的命令,具體可以直接執行hdfs.cmd檢視,或者在官網檢視。
六 啟動hadoop
執行D:\hadoop-3.1.1\sbin\start-all.cmd,回啟動4個視窗,需要仔細觀察4個視窗的啟動日誌,看是否有任意異常或錯誤報出。
如果一切正常,可以訪問如下服務:
常見問題:
1. 啟動Hadoop報錯,提示datanode卷錯誤:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
2018-09-30 10:45:10,306 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = DESKTOP-ORMM49N/10.200.130.178
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.1.1
STARTUP_MSG: classpath = C:\hadoop-3.1.1\etc\hadoop;C:\hadoop-3.1.1\share\hadoop\common;
... ...
C:\hadoop-3.1.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.1.1.jar
STARTUP_MSG: build = https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c; compiled by 'leftnoteasy' on 2018-08-02T04:26Z
STARTUP_MSG: java = 1.8.0_172
************************************************************/
2018-09-30 10:45:11,277 INFO checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/C:/hadoop-3.1.1/workplace/data
2018-09-30 10:45:11,312 WARN checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/C:/hadoop-3.1.1/workplace/data
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:455)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:796)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:678)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233)
at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:239)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:52)
at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:142)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-30 10:45:11,314 ERROR datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2018-09-30 10:45:11,318 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2018-09-30 10:45:11,321 INFO datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at DESKTOP-ORMM49N/10.200.130.178
************************************************************/
C:\hadoop-3.1.1\sbin>
解決方案:刪除提前手動建立的data資料夾,讓hadoop自己建立。
2. 啟動yarn nodemanager失敗,問題原因是winutil資料夾需要跟hadoop版本3.1.1一致的原始碼編譯的包
解決方案:到GITHUB找到跟Hadoop同一版本的winutil檔案
3. 無法訪問localhost:50070
修改hdfs-site.xml,增加如下配置後,問題解決。
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
但這時根本原因沒找到。於是註釋掉上面的配置後,再次啟動,檢視namenode的日誌發下下面有意思的情況:
想起自己在host檔案裡配置了某域名的IP對映:??????.??????.com 0.0.0.0,原來是host配置影響了這裡預設site啟動地址。