python爬蟲系列之爬取百度文庫（一）

阿新 • • 發佈：2019-02-06

一、什麼是selenium

在爬取百度文庫的過程中，我們需要使用到一個工具selenium（瀏覽器自動測試框架），selenium是一個用於web應用程式測試的工具，它可以測試直接執行在瀏覽器中，就像我們平時用瀏覽器上網一樣，支援IE(7，8，9，10，11),firefox,safari,chrome,opera等。所以，我們可以使用它去爬取網站的資料，用ajax載入的資料也可以爬取，還可以模擬使用者登入，爬取登入之後的資料。

二、怎麼安裝selenium

在Windows下，win+r,輸入cmd之後，開啟命令提示符視窗，輸入pip install selenium即可安裝，安裝成功之後，下載chromedirver.exe

三、開啟pycharm寫以下程式碼進行測試

from selenium import webdriver

if __name__ == "__main__":
    browser = webdriver.Chrome(executable_path="F:\python\chromedriver_win32\chromedriver.exe")
    browser.get("https://www.baidu.com/")

上面的executable_path為，我下載的chromedirver.exe的解壓位置

四、執行過程中遇到的問題

1、您使用的是不受支援的命令列標記:--ignore-certificate-errors。穩定性和安全性會有所下降

解決辦法：將上面的python程式碼改成下面這樣，增加chrome的option設定

    options = webdriver.ChromeOptions()
    options.add_experimental_option("excludeSwitches",["ignore-certificate-errors"])
    browser = webdriver.Chrome(executable_path="F:\python\chromedriver_win32\chromedriver.exe",chrome_options=options)
    browser.get("https://www.baidu.com/")

2、如果出現位址列中的引數為data;和右上角這樣的情況，請將chromedirver.exe換成2.32的，這就是為什麼我之前強調要是2.32的，因為我之前下載的是2.9的，換成2.32的之後，這些問題就解決了

3、selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home，執行程式的時候報錯，這說明沒有新增chromedriver.exe的路徑，下載好chromedriver解壓好之後，新增chromedriver.exe的絕對路徑，如下

 browser = webdriver.Chrome(executable_path="F:\python\chromedriver_win32\chromedriver.exe",chrome_options=options)

4、如果，想每次載入的時候不用新增chromedriver.exe的絕對路徑這麼麻煩，按下面的步驟進行操作，在你使用的python目錄下新增chromedriver.exe即可。比如，我在電腦上安裝了兩個anaconda，anaconda2和anaconda3將它們兩個安裝在不同的目錄下，而這個程式我是在使用anaconda3的環境下執行的，那麼我只需要將chromedriver.exe新增到anaconda3的scripts目錄下即可，如果你安裝的是python，那麼需要將chromdriver.exe新增到python27目錄下即可。

五、執行結果

python爬蟲系列之爬取百度文庫（一）

python爬蟲系列之爬取百度文庫（一）

Python爬蟲教程：爬取百度貼吧

Python爬蟲教程-08-post介紹(百度翻譯)（下）

Python3爬蟲之四簡單爬蟲架構【爬取百度百科python詞條網頁】

Python3爬蟲之爬取百度高清圖片

Python基於urllib,re爬取百度的國內即時新聞

Python3實現QQ機器人自動爬取百度文庫的搜索結果並發送給好友（主要是爬蟲）

Python 爬蟲入門之爬取妹子圖

python爬蟲學習之爬取全國各省市縣級城市郵政編碼

Python + selenium 爬取百度文庫Word文字

Python + selenium 爬取百度文庫Word文本

python爬蟲入門之爬取小說.md

Python3爬蟲-selenium爬取百度文庫

python爬蟲例項之爬取智聯招聘資料

Python爬蟲實戰之爬取鏈家廣州房價_04鏈家的模擬登入(記錄)

Python爬蟲實戰之爬取B站番劇資訊(詳細過程)

Python爬蟲系列：爬取小說並寫入txt檔案

Python依據單個關鍵詞爬取百度圖片

python爬蟲練習之爬取豆瓣讀書所有標籤下的書籍資訊

python 爬蟲實戰專案--爬取京東商品資訊（價格、優惠、排名、好評率等）

python爬蟲系列之爬取百度文庫（一）

相關推薦