1. 程式人生 > >自動化測試: Selenium 自動登入授權,再 Requests 請求內容

自動化測試: Selenium 自動登入授權,再 Requests 請求內容

Selenium 自動登入網站、截圖及 Requests 抓取登入後的網頁內容。一起了解下吧。 * Selenium: 支援 Web 瀏覽器自動化的一系列工具和庫的綜合專案。 * Requests: 唯一的一個非轉基因的 Python HTTP 庫,人類可以安全享用。 ![](https://img2020.cnblogs.com/blog/2049757/202005/2049757-20200531212056026-1450360563.gif) ## 為什麼選擇 Selenium 實現自動登入? Selenium 實現,相當於模擬使用者手動開啟瀏覽器、進行登入的過程。 相比直接 HTTP 請求登入,有幾個好處: 1. 避免登入視窗的複雜情況(iframe, ajax 等),省得分析細節。 * 用 Selenium 實現,依照使用者操作流程即可。 2. 避免模擬 Headers 、記錄 Cookies 等 HTTP 完成登入的細節。 * 用 Selenium 實現,依賴瀏覽器自身功能即可。 3. 利於實現載入等待、發現特殊情況(登入驗證等),加進一步邏輯。 另外,自動登入等過程的視覺化,給外行看挺讓人感覺高階的。 ## 為什麼選擇 Requests 抓取網頁內容? 抓取登入後的某些內容,而非爬取網站, Requests 夠用、好用。 ## 1) 準備 Selenium 基礎環境: Python 3.7.4 (anaconda3-2019.10) pip 安裝 Selenium : ```bash pip install selenium ``` 獲取 Selenium 版本資訊: ```bash $ python Python 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import selenium >>> print('Selenium version is {}'.format(selenium.__version__)) Selenium version is 3.141.0 ``` ## 2) 準備瀏覽器及其驅動 下載 Google Chrome 瀏覽器並安裝: https://www.google.com/chrome/ 下載 Chromium/Chrome WebDriver: https://chromedriver.storage.googleapis.com/index.html 然後,將 WebDriver 路徑加入到 PATH ,例如: ```bash # macOS, Linux export PATH=$PATH:/opt/WebDriver/bin >> ~/.profile # Windows setx /m path "%path%;C:\WebDriver\bin\" ``` ## 3) Go coding! ### 讀取登入配置 登入資訊是私密的,我們從 json 配置讀取: ```py # load config import json from types import SimpleNamespace as Namespace secret_file = 'secrets/douban.json' # { # "url": { # "login": "https://www.douban.com/", # "target": "https://www.douban.com/mine/" # }, # "account": { # "username": "username", # "password": "password" # } # } with open(secret_file, 'r', encoding='utf-8') as f: config = json.load(f, object_hook=lambda d: Namespace(**d)) login_url = config.url.login target_url = config.url.target username = config.account.username password = config.account.password ``` ### Selenium 自動登入 > 以 Chrome WebDriver 實現,登入測試站點為「豆瓣」。 開啟登入頁面,自動輸入使用者名稱、密碼,進行登入: ```py # automated testing from selenium import webdriver # Chrome Start opt = webdriver.ChromeOptions() driver = webdriver.Chrome(options=opt) # Chrome opens with “Data;” with selenium # https://stackoverflow.com/questions/37159684/chrome-opens-with-data-with-selenium # Chrome End # driver.implicitly_wait(5) from selenium.common.exceptions import TimeoutException from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC wait = WebDriverWait(driver, 5) print('open login page ...') driver.get(login_url) driver.switch_to.frame(driver.find_elements_by_tag_name("iframe")[0]) driver.find_element_by_css_selector('li.account-tab-account').click() driver.find_element_by_name('username').send_keys(username) driver.find_element_by_name('password').send_keys(password) driver.find_element_by_css_selector('.account-form .btn').click() try: wait.until(EC.presence_of_element_located((By.ID, "content"))) except TimeoutException: driver.quit() sys.exit('open login page timeout') ``` 如果用 IE 瀏覽器,如下: ```py # Ie Start # Selenium Click is not working with IE11 in Windows 10 # https://github.com/SeleniumHQ/selenium/issues/4292 opt = webdriver.IeOptions() opt.ensure_clean_session = True opt.ignore_protected_mode_settings = True opt.ignore_zoom_level = True opt.initial_browser_url = login_url opt.native_events = False opt.persistent_hover = True opt.require_window_focus = True driver = webdriver.Ie(options = opt) # Ie End ``` 如果設定更多功能,可以: ```py cap = opt.to_capabilities() cap['acceptInsecureCerts'] = True cap['javascriptEnabled'] = True ``` ### 開啟目標頁面,進行截圖 ```py print('open target page ...') driver.get(target_url) try: wait.until(EC.presence_of_element_located((By.ID, "board"))) except TimeoutException: driver.quit() sys.exit('open target page timeout') # save screenshot driver.save_screenshot('target.png') print('saved to target.png') ``` ### Requests 復刻 Cookies ,請求 HTML ```py # save html import requests requests_session = requests.Session() selenium_user_agent = driver.execute_script("return navigator.userAgent;") requests_session.headers.update({"user-agent": selenium_user_agent}) for cookie in driver.get_cookies(): requests_session.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain']) # driver.delete_all_cookies() driver.quit() resp = requests_session.get(target_url) resp.encoding = resp.apparent_encoding # resp.encoding = 'utf-8' print('status_code = {0}'.format(resp.status_code)) with open('target.html', 'w+') as fout: fout.write(resp.text) print('saved to target.html') ``` ## 4) 執行測試 可以臨時將 WebDriver 路徑加入到 PATH : ```bash # macOS, Linux export PATH=$(pwd)/drivers:$PATH # Windows set PATH=%cd%\drivers;%PATH% ``` 執行 Python 指令碼,輸出資訊如下: ```bash $ python douban.py Selenium version is 3.141.0 -------------------------------------------------------------------------------- open login page ... open target page ... saved to target.png status_code = 200 saved to target.html ``` 截圖 `target.png`, HTML 內容 `target.html` ,結果如下: ![](https://img2020.cnblogs.com/blog/2049757/202005/2049757-20200531212114890-1117506365.png) ## 結語 登入過程如果遇到驗證呢? 1. 滑動驗證,可以 Selenium 模擬 * 滑動距離,影象梯度演算法可判斷 2. 圖文驗證,可以 Python AI 庫識別 ## 參考 本文程式碼 Gist 地址: https://gist.github.com/ikuokuo/1160862c154d550900fb80110828c94c * Selenium: https://www.selenium.dev/documentation/en/ * WebDriver: https://www.selenium.dev/documentation/en/webdriver/driver_requirements/#quick-reference * requests: https://requests.readthedocs.io/en/latest/ * requestium: https://github.com/tryolabs/requestium * Selenium Requests: https://github.com/cryzed/Selenium-R