selenium栗子之登陸網站並獲取cookie
測試網站(航天雲網):
http://cas.casicloud.com/loginservice=http%3A%2F%2Fin.casicloud.com%2Floginc%3Fservice%3D%252Fsso%252Flogin.jsp%253Fredirect%253Dhttp%25253A%25252F%25252Fwww.casicloud.com%25252Floginc%25253Fret%25253Dhttp%2525253A%2525252F%2525252Fwww.casicloud.com%2525252F
介面如圖:
首先關於驗證碼:
很慶幸的是,經過分析,該網站的驗證碼不用通過OCR識別,相對應的,驗證碼的值在JS載入後,一段<input type="hidden" id="randomString" value="”。。。。的值李,因此,我們只需要模擬登陸後,取出JS載入好的值之後,正則匹配或者XPATH就能得到該值。
接著,開始:
1、設定瀏覽器,登入網頁:
url = ‘***’
driver = webdriver.Chrome()
driver.get(url)
2、個人建議設定一個時間間隔,便於JS的載入(我一般設定的3-5秒)。
driver.implicitly_wait(5)
3、在相對應的表格裡填寫賬戶密碼
driver.find_element_by_xpath('//*[@id="shortAccount"]')
driver.find_element_by_xpath('//*[@id="shortAccount"]').send_keys('賬戶名')
driver.find_element_by_xpath('//*[@id="password"]')
driver.find_element_by_xpath('//*[@id="password"]').send_keys('密碼')
4、通過JS載入後的頁面獲取驗證碼值:
html = driver.page_source
check_value = re.search(r'<input type="hidden" id="randomString" value="(\d\d\d\d)"',html).group(1)
5、填寫驗證碼,登入網站並獲取cookie:
key = str(check_value)
driver.find_element_by_xpath('//*[@id="code0"]')
driver.find_element_by_xpath('//*[@id="code0"]').send_keys(key)
driver.find_element_by_xpath('//*[@id="loginForm"]/div[6]/input').click()
driver.refresh()
cookies = driver.get_cookies()
ret = ''
for cookie in cookies:
cookie_name = cookie['name']
cookie_value = cookie['value']
ret = ret+cookie_name+'='+cookie_value+'; '
print ret
driver.quit()
上面的重新整理頁面(refresh)只是個人習慣。
然後程式碼整理一下,如下:
#coding:utf-8
import re
from selenium import webdriver
def login_get_cookie(url):
driver = webdriver.Chrome()
driver.get(url)
driver.implicitly_wait(5)
driver.find_element_by_xpath('//*[@id="shortAccount"]')
driver.find_element_by_xpath('//*[@id="shortAccount"]').send_keys('賬戶')
driver.find_element_by_xpath('//*[@id="password"]')
driver.find_element_by_xpath('//*[@id="password"]').send_keys('密碼')
html = driver.page_source
check_value = re.search(r'<input type="hidden" id="randomString" value="(\d\d\d\d)"',html).group(1)
key = str(check_value)
driver.find_element_by_xpath('//*[@id="code0"]')
driver.find_element_by_xpath('//*[@id="code0"]').send_keys(key)
driver.find_element_by_xpath('//*[@id="loginForm"]/div[6]/input').click()
driver.refresh()
cookies = driver.get_cookies()
ret = ''
for cookie in cookies:
cookie_name = cookie['name']
cookie_value = cookie['value']
ret = ret+cookie_name+'='+cookie_value+'; '
print ret
driver.quit()
return ret
url = 'http://cas.casicloud.com/login?service=http%3A%2F%2Fin.casicloud.com%2Floginc%3Fservice%3D%252Fsso%252Flogin.jsp%253Fredirect%253Dhttp%25253A%25252F%25252Fwww.casicloud.com%25252Floginc%25253Fret%25253Dhttp%2525253A%2525252F%2525252Fwww.casicloud.com%2525252F'
cookies = login_get_cookie(url)