1. 程式人生 > >python無法讀取hdfs檔案的問題:requests.exceptions.ConnectionError: HTTPConnectionPool

python無法讀取hdfs檔案的問題:requests.exceptions.ConnectionError: HTTPConnectionPool

1.問題一描述:在用python的hdfs庫操作HDFS時,可以正常的獲取到hdfs的檔案目錄

from hdfs import *
client = Client("http://10.0.30.9:50070")
print(client.list('/'))
['test.txt']

但是在讀取檔案時,出現了hdfs.util.HdfsError: File /user/dr.who/test.txt not found.的錯誤,嘗試使用pyhdfs也是同樣的問題,包括下面說的第二個問題

from hdfs import *
client = Client("http://10.0.30.9:50070")
print(client.list('/'))
with client.read('test.txt') as reader:
    content = reader.read()
    print(content)
Traceback (most recent call last):
  File "E:/pycharm/workspace/hadoopforwin/myhdfs.py", line 5, in <module>
    with client.read('test.txt') as reader:
  File "D:\python3.6\lib\contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 678, in read
    buffersize=buffer_size,
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 112, in api_handler
    raise err
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 107, in api_handler
    **self.kwargs
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 210, in _request
    _on_error(response)
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 50, in _on_error
    raise HdfsError(message, exception=exception)
hdfs.util.HdfsError: File /user/dr.who/test.txt not found.

2.問題一解決方法:出現這個問題是因為沒有指定根路徑(root path),需要在呼叫Client方法連線hdfs時指定root path

from hdfs import *
client = Client("http://10.0.30.9:50070", root='/')
print(client.list('/'))
with client.read('test.txt') as reader:
    content = reader.read()
    print(content)

執行程式碼,又出現了新的問題。。。。。

3.問題二描述:報錯內容的最後一行如下,這裡的hmaster是hadoop主機的主機名,說明程式沒有將主機名對映到正確的ip

requests.exceptions.ConnectionError: HTTPConnectionPool(host='hmaster', port=50075): Max retries exceeded with url: /webhdfs/v1/test.txt?op=OPEN&namenoderpcaddress=hMaster:9000&offset=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000000035BAB38>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

4.問題二解決方法:在執行python程式的主機的hosts檔案中加上主機名和ip的對映,對於我所使用的windows系統,hosts檔案的路徑是C://Windows/System32/drivers/etc/hosts,在檔案末尾加上

ip 主機名

以本文的情況為例,則是

10.0.30.9 hmaster

修改完記得儲存,執行程式成功讀取檔案。

5.在使用hdfs和pyhdfs庫時,除了讀取檔案,還有一些方法也會出現這種情況,解決方法相同