python3 只是爬蟲開發的程式語言,開發爬蟲還需要很多其他環境,比如 IDE 工具,常用庫等等. 根據我的使用體驗,推薦如下環境搭建步驟,桌面環境為 Windows 10.

  • 安裝 Anaconda

Anaconda 是一個整合度很高的基於 python 的資料科學平臺,無論在開發爬蟲還是機器學習等方面,都遊刃有餘. Anaconda 包含 250 多個數據科學包和自帶的包管理工具 conda,一行命令就可以輕鬆安裝絕大部分依賴庫, 比如 Scikit-Learn, Scipy, Tensorflow 等.  



比較常用的就是這三個應用了, Anaconda 在安裝好後已經為我們配好了自己的系統環境和 python3 的環境,通常安裝依賴的話只需要在命令列終端 Anaconda Prompt 直接執行 conda 命令就好.


>conda env list
# conda environments:
base                  *  D:\ProgramFiles\Anaconda

使用下面的命令檢視不同路徑下的 python:

>where python

檢視當前使用的 python 的版本資訊:

>python --version
Python 3.6.3 :: Anaconda custom (64-bit)


  • 安裝 Scrapy

Scrapy 是爬蟲的常用框架之一, 官網的安裝提示如下:

conda install -c conda-forge scrapy


CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/libssh2-1.8.0-vc14_2.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/noarch/hyperlink-17.3.1-py_0.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/pydispatcher-2.0.5-py36_0.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/yaml-0.1.7-vc14_0.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/qt-5.6.2-vc14_1.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.


先檢視 conda 上有沒有提供當前 python 版本的 scrapy 包

我的 python 版本是 3.6,可以看到列表最下面一行就是 python3.6 最新的 scrapy 版本,於是使用如下命令安裝:

>conda install scrapy
Solving environment: done

## Package Plan ##

  environment location: D:\ProgramFiles\Anaconda

  added / updated specs:
    - scrapy

The following packages will be downloaded:

The following packages will be UPDATED:

    ca-certificates:  2017.08.26-h94faf87_0      --> 2018.03.07-0
    certifi:          2017.7.27.1-py36h043bc9e_0 --> 2018.4.16-py36_0
    openssl:          1.0.2l-vc14hcac20b0_2      --> 1.0.2o-h8ea7d77_0

Proceed ([y]/n)? y

選擇 y 後繼續安裝:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done


>conda list
命令檢視 scrapy 是否安裝成功。