1. 程式人生 > >Linux下安裝pyspider 系統版本為centos7 【總結版】

Linux下安裝pyspider 系統版本為centos7 【總結版】

國慶節的現在重新租了個阿里雲伺服器,不得不裝個pyspider用於爬蟲,但是安裝卻沒那麼順利了。這裡把安裝過程記錄一下,以及一些error 的解決方法。

【1】首先確保系統裡面裝了pip ,沒有的話可以自己百度詳細資訊,這裡只貼出我安裝時的指令:

       wget https://pypi.python.org/packages/source/p/pip/pip-7.1.2.tar.gz#md5=3823d2343d9f3aaab21cf9c917710196
       tar -xvf pip-7.1.2.tar.gz
       cd pip-7.1.2
       python setup.py install

【2】安裝好了後就可以直接安裝pyspider了。輸入指令: pip install pyspider

       結果報錯!下面分別對遇到的每個報錯資訊做記錄:

(1)錯誤一,pip 的使用有問題,以及安裝flask出錯。如下:

[[email protected] fancy]# pip install pyspider
Collecting pyspider
/usr/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see

https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading pyspider-0.3.5.tar.gz (94kB)
    100% |████████████████████████████████| 98kB 41kB/s
Collecting Flask>=0.10 (from pyspider)
  Downloading Flask-0.10.1.tar.gz (544kB)
    15% |████▉                           | 81kB 250bytes/s eta 0:30:44
  Hash of the package
https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz#md5=378670fe456957eb3c27ddaef60b2b24
(fromhttps://pypi.python.org/simple/flask/) (e11c5569eb68d582ce1c85154b9b48c9) doesn't match the expected hash 378670fe456957eb3c27ddaef60b2b24!
Bad md5 hash for package https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz#md5=378670fe456957eb3c27ddaef60b2b24 (fromhttps://pypi.python.org/simple/flask/)

出錯原因是urllib3的ssl連線失敗。解決辦法是安裝需要的依賴庫什麼的,

相關指令:

yum install python-devel libffi-devel openssl-devel  
pip install pyopenssl ndg-httpsclient pyasn1

(注意,Ubuntu系統不能用yum,應該換成apt-get) 

安裝完了以後還是不能直接通過pip install pyspider 。因為上面這一步只是解決了pip使用時出現 InsecurePlatformWarning 的報錯資訊。

而flask還是不能裝上的,這個時候就只能通過自己手動裝上flask了。當然,有走了彎路,去搜索bad md5 hash for package。這裡就不貼了。

easy_install flask 

          就成功裝上了flask。這也說明,通過pip install flask 時出現錯誤,重新安裝時只會從緩衝裡面讀取,哪怕是裝好了相關依賴還是安裝不成功,這個時候通過easy_install去安裝也許是一個不錯的方法。

【3】再次執行  pip install pyspider .

一切都很順利,直到安裝lxml時出錯。這裡我把出錯的幾個關鍵資訊貼上來:

#資訊一#:

Installing collected packages: chardet, cssselect, lxml, pyquery, requests, certifi, tornado, Flask-Login, u-msgpack-python, click, pyspider
  Found existing installation: chardet 2.0.1
    DEPRECATION: Uninstalling a distutils installed project (chardet) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling chardet-2.0.1:
      Successfully uninstalled chardet-2.0.1
  Running setup.py install for chardet
  Running setup.py install for cssselect
  Running setup.py install for lxml
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-dtraef/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5Tyn0R-record/install-record.txt --single-version-externally-managed --compile:
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
      warnings.warn(msg)
    Building lxml version 3.4.4.
    Building without Cython.
    ERROR: /bin/sh: xslt-config: command not found

    ** make sure the development packages of libxml2 and libxslt are installed **

#資訊二#:

gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/tmp/pip-build-dtraef/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
    In file included from src/lxml/lxml.etree.c:239:0:
    /tmp/pip-build-dtraef/lxml/src/lxml/includes/etree_defs.h:14:31: fatal error: libxml/xmlversion.h: No such file or directory
     #include "libxml/xmlversion.h"
                                   ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-dtraef/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5Tyn0R-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-dtraef/lxml

從資訊一可以看出:已經下載好了所有pyspider依賴加入了安裝階段,並且chardet、cssselect也安裝成功了,是lxml安裝出錯。從報錯資訊看,應該是libxml2和libxslt沒有裝好。從資訊二看,也可能是 gcc 除了問題。我是先從資訊二入手:

      所以通過指令yum install python-dev gcc把python-dev和gcc重新安裝了一下。通過pip install lxml 發現還是出現這樣的資訊,這就說明出錯一定是在資訊一了。(不得不佩服能有這樣的安裝日記可以查閱啊,不然真的不知道哪裡出錯了!!)

輸入指令:

 yum install libxslt-devel libxml2-devel

然後在輸入:

 pip install lxml

發現安裝成功了!

【4】到這裡,再輸入 pip install pyspider 終於安裝成功了!!!盡情開啟你的爬蟲之路吧!

如果你想看我的安裝過程的詳細資訊,可以看我的這篇博文:

【總結】

     1. 特別留意安裝過程中的相關資訊,那可以排除bug的線索啊

     2. 最好搞清楚原理和每條指令的含義,不然,有時候會為自己的系統裝上一大堆沒有什麼用的東西

     3. 其實可以通過搜尋指令來查詢報錯資訊,這樣貌似更高效、更有針對性