1. 程式人生 > >Atittit HDFS hadoop 大資料檔案系統java使用總結 目錄 1. 作業系統,進行操作 1 2. Hdfs 類似nfs ftp遠端分散式檔案服務 2 3. 啟動hdfs服務start

Atittit HDFS hadoop 大資料檔案系統java使用總結 目錄 1. 作業系統,進行操作 1 2. Hdfs 類似nfs ftp遠端分散式檔案服務 2 3. 啟動hdfs服務start

Atittit HDFS hadoop 大資料檔案系統java使用總結

 

目錄

1. 作業系統,進行操作 1

2. Hdfs 類似nfs ftp遠端分散式檔案服務 2

3. 啟動hdfs服務start-dfs.cmd 2

3.1. 配置core-site.xml 2

3.2. 啟動 2

3.3. Code 2

4. prob總結 6

4.1. 啟動hdfs服務中提示windows找不到hadoop 6

4.2. D:\haddop\hadoop-3.1.1\bin\hdfs namenode -format 6

4.3. 提示file://沒許可權 10

4.4. 347java.io.IOException: NameNode is not formatted 11

4.5. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory C:\tmp\hadoop-Administrator\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible. 11

4.6. Unsafe link  ,, 12

4.7. Unkonw host  java.net.ConnectException: Connection refused: no further information 12

5. Theory 12

5.1. 建立資料夾 13

5.2. 寫檔案 13

6. ref 13

 

 

檔案系統的幾種操作

 

  1. 作業系統,進行操作
  1. 資料夾的操作:增刪改查
  2. 遠端檔案的IO操作
  3. 檔案的上傳下載 (本地 遠端檔案複製操作

 

.具體的操作命令

  1. 根據配置獲取HDFS檔案作業系統(共有三種方式)
    1. 方法一:直接獲取配置檔案方法
      通常情況下該方法用於本地有hadoop系統,可以直接進行訪問。此時僅需在配置檔案中指定要操作的檔案系統為hdfs即可。這裡的conf的配置檔案可以設定hdfs的各種引數,並且優先順序比配置檔案要搞
    2. 方法二:指定URI路徑,進而獲取配置檔案建立作業系統
      通常該方法用於本地沒有hadoop系統,但是可以通過URI的方式進行訪問。此時要給給定hadoop的NN節點的訪問路徑,hadoop的使用者名稱,以及配置檔案資訊(此時會自動訪問遠端hadoop的配置檔案)
  1. Hdfs 類似nfs ftp遠端分散式檔案服務
  2. 啟動hdfs服務start-dfs.cmd
    1. 配置core-site.xml

 

D:\haddop\hadoop-3.1.1\etc\hadoop>core-site.xml

 

<configuration>

 

 <property>

  <name>fs.default.name</name>

  <value>hdfs://huabingood01:9000</value>

</property>

 

</configuration>

 

    1. 啟動

 

    1. Code

package hdfsHadoopUse;

 

import java.io.IOException;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

 

public class hdfsHadoopClass {

 

public static void main(String[] args) throws IOException {

 String pathToCreate = "/firstDirS09/secdirS09";

 hdfsHadoopClass hdfsHadoopClass = new hdfsHadoopClass();

FileSystem fs=hdfsHadoopClass.getHadoopFileSystem();

hdfsHadoopClass.myCreatePath(fs, pathToCreate);

System.out.println("--f");

}

/**

     * 根據配置檔案獲取HDFS操作物件

     * 有兩種方法:

     *  1.使用conf直接從本地獲取配置檔案建立HDFS物件

     *  2.多用於本地沒有hadoop系統,但是可以遠端訪問。使用給定的URI和使用者名稱,訪問遠端的配置檔案,然後建立HDFS物件。

     * @return FileSystem

 * @throws IOException

     */

    public FileSystem getHadoopFileSystem() throws IOException {

 

 

        FileSystem fs = null;

        Configuration conf = null;

 

        // 方法一,本地有配置檔案,直接獲取配置檔案(core-site.xml,hdfs-site.xml)

        // 根據配置檔案建立HDFS物件

        // 此時必須指定hdsf的訪問路徑。

        conf = new Configuration();

        // 檔案系統為必須設定的內容。其他配置引數可以自行設定,且優先順序最高

        conf.set("fs.defaultFS", "hdfs://0.0.0.0:19000");

        conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());  

        

            // 根據配置檔案建立HDFS物件

            fs = FileSystem.get(conf);

       

 

 

        return fs;

    }

    

    /**

     * 這裡的建立資料夾同shell中的mkdir -p 語序前面的資料夾不存在

     * 跟java中的IO操作一樣,也只能對path物件做操作;但是這裡的Path物件是hdfs中的

     * @param fs

     * @return

     * @throws IOException

     */

    public boolean myCreatePath(FileSystem fs, String pathToCreate) throws IOException{

        boolean b = false;

 

     //   String pathToCreate = "/hyw/test/huabingood/hyw";

Path path = new Path(pathToCreate);

        try {

            // even the path exist,it can also create the path.

            b = fs.mkdirs(path);

        }   finally {

           

                fs.close();

           

        }

        return b;

    }

}

 

  1. prob總結

 

Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: huabingood01

Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: huabingood01

需要啟動hdfs服務。。

%HADOOP_PREFIX%\sbin\start-dfs.cmd

    1. 啟動hdfs服務中提示windows找不到hadoop

start-dfs.cmd

start "Apache Hadoop Distribution" hadoop namenode

start "Apache Hadoop Distribution" hadoop datanode

需要先建立namenode

 D:\haddop\hadoop-3.1.1\bin\hdfs namenode -format

這裡的hadoop指的是應該是bin\hadoop.cmd命令,把他加入到path目錄envi pathvar

    1. D:\haddop\hadoop-3.1.1\bin\hdfs namenode -format

D:\haddop\hadoop-3.1.1\sbin> D:\haddop\hadoop-3.1.1\bin\hdfs namenode -format

2018-10-28 07:02:54,801 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = hmNotePC/192.168.1.101

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 3.1.1

STARTUP_MSG:   classpath = D:\haddop\hadoop-3.1.1\etc\hadoop;D:\haddop\hadoop-3.1.1\share\had

STARTUP_MSG:   build = https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c; compiled by 'leftnoteasy' on 2018-08-02T04:26Z

STARTUP_MSG:   java = 1.8.0_31

************************************************************/

2018-10-28 07:02:54,854 INFO namenode.NameNode: createNameNode [-format]

Formatting using clusterid: CID-ecf4351a-e57c-411b-8ef3-2198981bc44b

2018-10-28 07:02:56,060 INFO namenode.FSEditLog: Edit logging is async:true

2018-10-28 07:02:56,090 INFO namenode.FSNamesystem: KeyProvider: null

2018-10-28 07:02:56,092 INFO namenode.FSNamesystem: fsLock is fair: true

2018-10-28 07:02:56,100 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false

2018-10-28 07:02:56,119 INFO namenode.FSNamesystem: fsOwner             = Administrator (auth:SIMPLE)

2018-10-28 07:02:56,120 INFO namenode.FSNamesystem: supergroup          = supergroup

2018-10-28 07:02:56,120 INFO namenode.FSNamesystem: isPermissionEnabled = true

2018-10-28 07:02:56,121 INFO namenode.FSNamesystem: HA Enabled: false

2018-10-28 07:02:56,203 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling

2018-10-28 07:02:56,229 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000

2018-10-28 07:02:56,229 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true

2018-10-28 07:02:56,238 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000

2018-10-28 07:02:56,239 INFO blockmanagement.BlockManager: The block deletion will start around 2018 十月 28 07:02:56

2018-10-28 07:02:56,243 INFO util.GSet: Computing capacity for map BlocksMap

2018-10-28 07:02:56,243 INFO util.GSet: VM type       = 64-bit

2018-10-28 07:02:56,248 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB

2018-10-28 07:02:56,251 INFO util.GSet: capacity      = 2^21 = 2097152 entries

2018-10-28 07:02:56,269 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false

2018-10-28 07:02:56,362 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS

2018-10-28 07:02:56,362 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

2018-10-28 07:02:56,363 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0

2018-10-28 07:02:56,364 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000

2018-10-28 07:02:56,364 INFO blockmanagement.BlockManager: defaultReplication         = 3

2018-10-28 07:02:56,365 INFO blockmanagement.BlockManager: maxReplication             = 512

2018-10-28 07:02:56,366 INFO blockmanagement.BlockManager: minReplication             = 1

2018-10-28 07:02:56,366 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2

2018-10-28 07:02:56,367 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms

2018-10-28 07:02:56,368 INFO blockmanagement.BlockManager: encryptDataTransfer        = false

2018-10-28 07:02:56,368 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000

2018-10-28 07:02:56,425 INFO util.GSet: Computing capacity for map INodeMap

2018-10-28 07:02:56,426 INFO util.GSet: VM type       = 64-bit

2018-10-28 07:02:56,426 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB

2018-10-28 07:02:56,427 INFO util.GSet: capacity      = 2^20 = 1048576 entries

2018-10-28 07:02:56,428 INFO namenode.FSDirectory: ACLs enabled? false

2018-10-28 07:02:56,429 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true

2018-10-28 07:02:56,429 INFO namenode.FSDirectory: XAttrs enabled? true

2018-10-28 07:02:56,430 INFO namenode.NameNode: Caching file names occurring more than 10 times

2018-10-28 07:02:56,440 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRo

 

2018-10-28 07:02:56,444 INFO snapshot.SnapshotManager: SkipList is disabled

2018-10-28 07:02:56,452 INFO util.GSet: Computing capacity for map cachedBlocks

2018-10-28 07:02:56,452 INFO util.GSet: VM type       = 64-bit

2018-10-28 07:02:56,453 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB

2018-10-28 07:02:56,453 INFO util.GSet: capacity      = 2^18 = 262144 entries

2018-10-28 07:02:56,467 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10

2018-10-28 07:02:56,468 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10

2018-10-28 07:02:56,469 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25

2018-10-28 07:02:56,475 INFO namenode.FSNamesystem: Retry cache on namenode is enabled

2018-10-28 07:02:56,476 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis

2018-10-28 07:02:56,481 INFO util.GSet: Computing capacity for map NameNodeRetryCache

2018-10-28 07:02:56,482 INFO util.GSet: VM type       = 64-bit

2018-10-28 07:02:56,482 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB

2018-10-28 07:02:56,483 INFO util.GSet: capacity      = 2^15 = 32768 entries

2018-10-28 07:02:56,527 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1079199093-192.168.1.101-1540681376517

2018-10-28 07:02:56,547 INFO common.Storage: Storage directory \tmp\hadoop-Administrator\dfs\name has been successfully formatted.

2018-10-28 07:02:56,580 INFO namenode.FSImageFormatProtobuf: Saving image file \tmp\hadoop-Administrator\dfs\name\current\fsimage.ckpt_0000000000000000000 us

2018-10-28 07:02:56,717 INFO namenode.FSImageFormatProtobuf: Image file \tmp\hadoop-Administrator\dfs\name\current\fsimage.ckpt_0000000000000000000 of size 3

2018-10-28 07:02:56,741 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

2018-10-28 07:02:56,756 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hmNotePC/192.168.1.101

************************************************************/

    1. 提示file://沒許可權

D:\haddop\hadoop-3.1.1\etc\hadoop>core-site.xml

 

<configuration>

 

 <property>

  <name>fs.default.name</name>

  <value>hdfs://huabingood01:9000</value>

</property>

 

</configuration>

 

    1. 347java.io.IOException: NameNode is not formatted

 

評論

 

訪問localhost:50070失敗,說明namenode啟動失敗
3、檢視namenode啟動日誌

 

    1. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory C:\tmp\hadoop-Administrator\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible.

        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSI

mage.java:376)

        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(

 

    1. Unsafe link  ,,

Hadoop.dll  winutil.exe  feodg   system  win  dir hto...maybe  need boot

    1. Unkonw host  java.net.ConnectException: Connection refused: no further information

 

D:\haddop\hadoop-3.1.1\etc\hadoop>core-site.xml

 

<configuration>

 

 <property>

  <name>fs.default.name</name>

  <value>hdfs://huabingood01:9000</value>

</property>

 

</configuration>

    conf = new Configuration();

        // 檔案系統為必須設定的內容。其他配置引數可以自行設定,且優先順序最高

        conf.set("fs.defaultFS", "hdfs://0.0.0.0:19000");

        conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());  

        

 

Cfg file and code url yao yyo..

 

 

  1. Theory

 

C:\tmp\hadoop-Administrator>tree

卷 p1sys 的資料夾 PATH 列表

卷序列號為 A87E-7AB4

C:.

├─dfs

│  ├─data

│  └─name

└─nm-local-dir

 

    1. 建立資料夾

 

 String pathToCreate = "/firstDirS09/secdirS09";

    hdfsHadoopClass.myCreatePath(fs, pathToCreate);

實際建立的資料夾在的盤

 

D:\firstDirS09\secdirS09

 

    1. 寫檔案

 

    //wirte file

FSDataOutputStream FSDataOutputStream1 = fs.create(new Path("/file1S09.txt"));  

FSDataOutputStream1.writeUTF("attilax bazai");

FSDataOutputStream1.close();

 

 

D:\file1S09.txt

D:\.file1S09.txt.crc

 

  1. ref

使用javaAPI操作hdfs - huabingood - 部落格園.html