大資料之hadoop【hdfs】

阿新 • • 發佈：2019-01-20

HDFS特點
    分散式
    資料量越來越多，在一個作業系統管轄的範圍存不下了，那麼就分配到更多的作業系統管理的磁碟中，但是不方便管理和維護，因此迫切需要一種系統來管理多臺機器上的檔案，這就是分散式檔案管理系統 。
    高可用
        是一種允許檔案通過網路在多臺主機上分享的檔案系統，可讓多機器上的多使用者分享檔案和儲存空間。
    通透性
        實際上是通過網路來訪問檔案的動作，由程式與使用者看來，就像是訪問本地的磁碟一般。
    容錯
        即使系統中有某些節點離線，整體來說系統仍然可以持續運作而不會有資料損失。
        分散式檔案管理系統很多，hdfs只是其中一種，不合適小檔案。
java中操作大檔案的一個包：java.nio包

====================================================

NameNode
是整個檔案系統的管理節點。它維護著整個檔案系統的檔案目錄樹，檔案/目錄的
元資訊和每個檔案對應的資料塊列表。接收使用者的操作請求。
檔案包括：
1)fsimage:元資料映象檔案。儲存某一時段NameNode記憶體元資料資訊。
2)edits:操作日誌檔案。
3)fstime:儲存最近一次checkpoint的時間
檢視fsimage元資料資訊可以通過命令將其轉化為普通xml檔案來進行檢視
bin/hdfs oiv -p XML -i inputfile[具體的檔案路徑] -o outputfile
eg.
[

[email protected] current]# pwd
/opt/hadoop-repo/name/current
current]# hdfs oiv -p XML -i fsimage_0000000000000000116 -o fsimage.xml
檢視edits操作日誌
bin/hdfs oev -p XML -i inputfile[具體的檔案路徑] -o outputfile
eg.
[[email protected] current]# pwd
/opt/hadoop-repo/name/current
current]# hdfs oiv -p XML -i edits_inprogress_0000000000000000119 -o edits.xml
DateNode
說明:datanode中的VERSION中的clusterID必須要和namenode中的VERSION中的clusterID保持一致。
不一致的原因：多次格式化造成(沒有清空 name|data|secondary|tmp目錄中的資料)
提供真實檔案資料的儲存服務。
檔案塊（block）：最基本的儲存單位。對於檔案內容而言，一個檔案的長度大小是size，那麼從檔案的０偏移開始，按照固定的大小，順序對檔案進行劃分並編號，劃分好的每一個塊稱一個Block。HDFS預設Block大小是128MB，以一個256MB檔案，共有256/128=2個Block。
不同於普通檔案系統的是，HDFS中，如果一個檔案小於一個數據塊的大小，並不佔用整個資料塊儲存空間
Replication。多複本。預設是三個。

====================================================
HDFS shell
data]# hdfs dfs -appendToFile append.txt /hello 向/hello中追加內容append.txt
data]# hdfs dfs -cp /hello /hello1 將/hello拷貝到/hello1
HDFS JAVA

public class HDFStest {
    /*列出目錄的內容 listStatus
      讀取檔案 open
      建立目 mkdirs
      建立檔案 create
      顯示檔案儲存位置getFileBlockLocations
     刪除檔案或目錄 delete
    */
    //Permission    Owner   Group   Size    Replication Block Size  Name
    FileSystem fileSystem;
    Configuration configuration;
    @Before
    public  void setUp() throws Exception{
        URI uri=new URI("hdfs://master:9000/");
        configuration=new Configuration();
        fileSystem=FileSystem.get(uri,configuration);
//        System.out.println(fileSystem);
    }
    //獲取檔案列表狀態
    @Test
    public  void  testListStatus()throws IOException{
//        根路徑
        Path path=new Path("/");
        FileStatus[] fileStatuses = fileSystem.listStatus(path);
        for (FileStatus f: fileStatuses) {
            FsPermission permission = f.getPermission();
            String fp_prefix = "-";
            if (f.isDirectory()){
                fp_prefix = "d";
            }
            FsAction userAction=permission.getUserAction();
            FsAction groupAction = permission.getGroupAction();
            FsAction otherAction = permission.getOtherAction();
            String acl = fp_prefix+userAction.SYMBOL+groupAction.SYMBOL+otherAction.SYMBOL;
            String owner = f.getOwner();
            String group= f.getGroup();
            long size=f.getLen();
            short replication =f.getReplication();
            long blockSize = f.getBlockSize();
            String name = f.getPath().getName();
            String mTime = new SimpleDateFormat("yyyy-MM-dd hh:mm").format(new Date(f.getModificationTime()));
            System.out.println(acl + " " + replication + " " + owner + " " +
                    group + " " + size + " " + mTime + " " + name);
        }
    }
    //讀取檔案 open
    @Test
    public void testRead() throws IOException {
        Path path = new Path("/hello");
        FSDataInputStream fis = fileSystem.open(path);
//        BufferedReader br=new BufferedReader(new InputStreamReader(fis));
//        String  line=null;
//        while ((line = br.readLine())!=null)
//        {
//            System.out.println(line);
//        }
//        br.close();
//        第二種讀取方法
        IOUtils.copyBytes(fis,System.out,1024,false);
        IOUtils.closeStream(fis);
    }
//   建立目錄 mkdirs
     @Test
    public void testMkdir() throws IOException {
         Path path = new Path("/mutil-dir");
         boolean ret = fileSystem.mkdirs(path);
         Assert.assertEquals(true,ret);

     }
//     建立檔案 create
    @Test
    public void testCreateFile() throws IOException {
        Path path = new Path("/mutil-dir1/readme.txt");
//        boolean ret = fileSystem.createNewFile(path);
//        Assert.assertEquals(true,ret);
        FSDataOutputStream fos = fileSystem.create(path);
        byte[] bytes= "你好三毛".getBytes();
        fos.write(bytes);
        fos.close();
        //推薦使用第二種
    }
    //顯示檔案儲存位置 getFileBlockLocations
    @Test
    public void testLocation() throws IOException {
        Path path = new Path("/mutil-dir1/readme.txt");
        FileStatus fs=fileSystem.getFileStatus(path);
        long len = fs.getLen();
        BlockLocation[] locations = fileSystem.getFileBlockLocations(path,0,len);
        for (BlockLocation location: locations) {
            String[] hosts = location.getHosts();
            String[] names = location.getNames();
            long length = location.getLength();
            System.out.println(Arrays.toString(hosts));
            System.out.println(Arrays.toString(names));
            System.out.println("length "+ length);
        }
    }
    //刪除檔案或目錄
    @Test
    public void testDelete() throws IOException {
        Path path = new Path("/mutil-dir1/");
        boolean ret = fileSystem.delete(path,true);
        Assert.assertEquals(true,ret);
    }

=====================================================

HDFS和RPC

Hadoop的整個體系結構就是構建在RPC之上的(org.apache.hadoop.ipc)。
RPC(Remote Procedure Call)——遠端過程呼叫協議，它是一種通過網路從遠端計算機程式上請求服務，而不需要了解底層網路技術的協議。RPC協議假定某些傳輸協議的存在，如TCP或UDP，為了通訊程式之間攜帶資訊資料。在OSI網路通訊模型中，RPC跨越了傳輸層和應用層。RPC使得包括網路分散式多程式在內的應用程式更加容易。
RPC採用客戶機/伺服器模式。請求程式就是一個客戶機，而服務提供程式就是一個伺服器。首先，客戶機呼叫程序傳送一個帶引數的呼叫資訊到服務程序，然後等待響應資訊。在伺服器端，程序保持睡眠狀態直到呼叫資訊到達為止。當一個呼叫資訊到達，伺服器獲得程序引數，計算結果，傳送答覆資訊，然後等待下一個呼叫資訊，最後，客戶端呼叫程序接收答覆資訊，獲取程序結果，最後呼叫執行。

這裡寫圖片描述

RPC程式碼演示

服務介面

package com.sanmao.hadoop_02.rpc;

import org.apache.hadoop.ipc.VersionedProtocol;


//要繼承VersionedProtocol 這個通訊協議
public interface IHelloService extends VersionedProtocol{
    long versionID=123456789L;
    public  String sayHello(String name);
    public String heartBeat(String status);
}

具體服務類

package com.sanmao.hadoop_02.rpc;

import org.apache.hadoop.ipc.ProtocolSignature;

import java.io.IOException;

/**
 * Created by kkk on 2016/10/21.
 */
public class HelloServiceImpl implements IHelloService{
    public String sayHello(String name) {
        System.out.println("hello 方法被呼叫了");
        return  "hello "+name;
    }

    public String heartBeat(String status) {
        System.out.println("心跳檢測");
        return "心跳響應  "+ status;
    }

    public long getProtocolVersion(String s, long l) throws IOException {
        return IHelloService.versionID;
    }

    public ProtocolSignature getProtocolSignature(String s, long l, int i) throws IOException {
        //新建一個簽名
        return new ProtocolSignature();
    }
}

RPC 伺服器

package com.sanmao.hadoop_02.rpc;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;

import java.io.IOException;

public class RPCDriver {
    public static void main(String[] args) throws IOException {
        Configuration configuration = new Configuration();
        RPC.Builder builder = new RPC.Builder(configuration);
        RPC.Server server = builder.setBindAddress("localhost").setPort(8888)
                .setProtocol(IHelloService.class)
                .setInstance(new HelloServiceImpl()).build();
        server.start();
        System.out.println("RPC伺服器開啟");
    }
}

RPC 客戶端

package com.sanmao.hadoop_02.rpc;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;

import java.io.IOException;
import java.net.InetSocketAddress;


public class RPCClient {
     /*Class<T> protocol,
        long clientVersion,
        InetSocketAddress addr,
        Configuration conf*/
    //RPC底層採取的就是TPC/IP協議
     public static void main(String[] args) throws IOException, InterruptedException {
         Configuration configuration= new Configuration();
         InetSocketAddress inetSocketAddress=new InetSocketAddress("localhost",8888);
         IHelloService proxy = RPC.getProxy(IHelloService.class, IHelloService.versionID, inetSocketAddress, configuration);
         //SOAP協議
         String result = proxy.sayHello("三毛");
         System.out.println(result);
         while (true){
             String ret = proxy.heartBeat(System.currentTimeMillis() + "");
             System.out.println(ret);
             Thread.sleep(3000);
         }
     }
}

HDFS呼叫之資料儲存讀檔案解析

這裡寫圖片描述

HDFS呼叫之資料儲存寫檔案解析

這裡寫圖片描述
三個關鍵的介面
ClientProtocol
是客戶端(FileSystem)與NameNode通訊的介面。
DatanodeProtocol
是DataNode與NameNode通訊的介面。
NamenodeProtocol
是SecondaryNameNode與NameNode通訊的介面

常見的HDFS運維

    修改回收站清空資料的時間
       在每個節點(不僅僅是主節點)上新增配置 core-site.xml,增加如下內容
       <property>
          <name>fs.trash.interval</name>
          <value>1440</value>
          <description>單位是分鐘</description>
       </property>
    hdfs dfsadmin -safemode leave ——>hdfs離開受保護模式
    檢視磁碟內容
        df -lh /
        檢視具體目錄下面檔案大小
        du -lh --max-depth=1 path
    檢視hdfs
        hdfs dfs -du -lh hdfs_path
        hdfs dfs -df hdfs_path
    檢視磁碟健康狀況
        hdfs fsck -blocks hdfs_path

大資料之hadoop【hdfs】

目錄

HDFS

HDFS和RPC

RPC程式碼演示

HDFS呼叫之資料儲存讀檔案解析

HDFS呼叫之資料儲存寫檔案解析

常見的HDFS運維

大資料之hadoop【hdfs】

大資料之Hadoop學習《一》——認識HDFS

大資料之hadoop（檔案系統HDFS）

大資料之Hadoop學習（環境配置）——Hadoop偽分散式叢集搭建

大資料之Hadoop學習——動手實戰學習MapReduce程式設計例項

大資料之hadoop / hive / hbase 的區別是什麼？有什麼應用場景？

最詳細的大資料之Hadoop分散式系統架構解析！沒有之一！

大資料之hadoop分散式計算框架MapReduce

大資料之hadoop對比spark------資料儲存

大資料之Hadoop生態系統概述

大資料之hadoop 環境搭建從零開始——實戰訓練

大資料之hadoop機架感知

初探大資料之Hadoop簡介

中國綠城大資料研發中心【offer】

大資料之Hadoop（MapReduce（四））------->企業優化

大資料之hadoop面試題4

大資料之hadoop單機版虛擬機器Vmware安裝教程

大資料開發之Hadoop篇----hdfs讀寫許可權操作

大資料開發之Hadoop篇----hdfs垃圾回收機制配置

大資料開發之Hadoop篇----hdfs讀流程

大資料之hadoop【hdfs】

目錄

HDFS

HDFS和RPC

RPC程式碼演示

HDFS呼叫之資料儲存讀檔案解析

HDFS呼叫之資料儲存寫檔案解析

常見的HDFS運維

相關推薦