1. 程式人生 > >Python系列之入門篇——HDFS

Python系列之入門篇——HDFS

system rec urn cep gpo user raise ret append

Python系列之入門篇——HDFS

簡介

HDFS (Hadoop Distributed File System) Hadoop分布式文件系統,具有高容錯性,適合部署在廉價的機器上。Python
提供了兩種接口方式,分別是hdfscli(Restful Api Call),pyhdfs(RPC Call),這一節主要講hdfscli的使用

代碼示例

  1. 安裝

    pip install hdfs
  2. 引入相關模塊

    from hdfs import *
  3. 創建客戶端

    """
    It has two different kind of client, Client and InsecureClient.
    Client: cannot define file owner
    InsecureClient: can define file owner, default None
    """
    hdfs_root_path = ‘http://localhost:50070‘
    fs = Client(hdfs_root_path)
    fs = InsecureClient(hdfs_root_path, user=‘hdfs‘)
  4. 創建目錄

    """
    Change file permission to 777, default None
    """
    fs.makedirs(‘/test‘, permission=777)
  5. 寫文件

    """
    Write append or not depends on the file is exist or not
    strict: If `False`, return `None` rather than raise an exception if
          the path doesn‘t exist.
    """
    content = fs.content(hdfs_file_path, strict=False)
    if content is None:
        fs.write(‘/test/test.txt‘, data=data, permission=777)
    else:
        fs.write(‘/test/test.txt‘, data=data, append=True)
  6. 上傳文件

    """
    overwrite default False, if don‘t set True, when you upload the file which is exist
    in hdfs, it will raise File is exist Exception.
    """
    client.upload(hdfs_path, local_path, overwrite=True)
  7. 總結
    還沒有找到判斷文件是否存在的方法,目前代碼示例中用fs.content()來替換,如果大家有更好的方式,也麻煩分享給我

Python系列之入門篇——HDFS