1. 程式人生 > >Python系列之入門篇——HDFS


system rec urn cep gpo user raise ret append



HDFS (Hadoop Distributed File System) Hadoop分布式文件系統,具有高容錯性,適合部署在廉價的機器上。Python
提供了兩種接口方式,分別是hdfscli(Restful Api Call),pyhdfs(RPC Call),這一節主要講hdfscli的使用


  1. 安裝

    pip install hdfs
  2. 引入相關模塊

    from hdfs import *
  3. 創建客戶端

    It has two different kind of client, Client and InsecureClient.
    Client: cannot define file owner
    InsecureClient: can define file owner, default None
    hdfs_root_path = ‘http://localhost:50070‘
    fs = Client(hdfs_root_path)
    fs = InsecureClient(hdfs_root_path, user=‘hdfs‘)
  4. 創建目錄

    Change file permission to 777, default None
    fs.makedirs(‘/test‘, permission=777)
  5. 寫文件

    Write append or not depends on the file is exist or not
    strict: If `False`, return `None` rather than raise an exception if
          the path doesn‘t exist.
    content = fs.content(hdfs_file_path, strict=False)
    if content is None:
        fs.write(‘/test/test.txt‘, data=data, permission=777)
        fs.write(‘/test/test.txt‘, data=data, append=True)
  6. 上傳文件

    overwrite default False, if don‘t set True, when you upload the file which is exist
    in hdfs, it will raise File is exist Exception.
    client.upload(hdfs_path, local_path, overwrite=True)
  7. 總結
