Python系列之入門篇——HDFS
阿新 • • 發佈:2018-01-22
system rec urn cep gpo user raise ret append
Python系列之入門篇——HDFS
簡介
HDFS (Hadoop Distributed File System) Hadoop分布式文件系統,具有高容錯性,適合部署在廉價的機器上。Python
提供了兩種接口方式,分別是hdfscli(Restful Api Call),pyhdfs(RPC Call),這一節主要講hdfscli的使用
代碼示例
安裝
pip install hdfs
引入相關模塊
from hdfs import *
創建客戶端
""" It has two different kind of client, Client and InsecureClient. Client: cannot define file owner InsecureClient: can define file owner, default None """ hdfs_root_path = ‘http://localhost:50070‘ fs = Client(hdfs_root_path) fs = InsecureClient(hdfs_root_path, user=‘hdfs‘)
創建目錄
""" Change file permission to 777, default None """ fs.makedirs(‘/test‘, permission=777)
寫文件
""" Write append or not depends on the file is exist or not strict: If `False`, return `None` rather than raise an exception if the path doesn‘t exist. """ content = fs.content(hdfs_file_path, strict=False) if content is None: fs.write(‘/test/test.txt‘, data=data, permission=777) else: fs.write(‘/test/test.txt‘, data=data, append=True)
上傳文件
""" overwrite default False, if don‘t set True, when you upload the file which is exist in hdfs, it will raise File is exist Exception. """ client.upload(hdfs_path, local_path, overwrite=True)
總結
還沒有找到判斷文件是否存在的方法,目前代碼示例中用fs.content()來替換,如果大家有更好的方式,也麻煩分享給我
Python系列之入門篇——HDFS