利用selenium實現獲取驗證碼

阿新 • • 發佈：2019-02-20

獲取驗證碼有兩種思路：

1.獲取頁面原始碼，提取驗證碼圖片

2.利用selenium擷取頁面，定位驗證碼元素的位置，利用Image進行處理，獲取其中驗證碼部分

下面進行解析：

1.獲取頁面原始碼，提取驗證碼圖片

如何獲取原始碼並提取驗證碼圖片過程就不在分析了，既然看到這篇文章，相信這些工作都不在話下了。
這裡只分析一下缺點：當提取驗證碼url後發現每次開啟該驗證碼圖片，其內容不斷變化，
以搜狗驗證碼為例：http://weixin.sogou.com/antispider/util/seccode.php?tc=1486691901，該驗證碼是單獨載入進頁面，而非嵌入，這時候，單純提取驗證碼url會發現當前驗證碼和提取驗證碼地址開啟的內容不一樣。這時候，我們需要一個更方便簡單的方法。

2.利用selenium擷取頁面
selenium.webdriver 內建了擷取當前頁面的功能，其中：

a.WebDriver.Chrome自帶的方法只能對當前視窗截圖，若是需要擷取的視窗超過了一屏，就只能另闢蹊徑了。

b.WebDriver.PhantomJS自帶的方法支援對整個網頁截圖。

在這裡，我們利用兩種方法均可，因為驗證碼介面通常比較簡單。

    #開啟驗證碼介面
    driver = webdriver.Chrome()
    url = "http://weixin.sogou.com/antispider/?from=%2fweixinwap%3Fpage 
%3d2%26_rtype%3djson%26ie%3dutf8%26type%3d2%26query%3d%E6%91%A9%E6%8B%9C%E5%8D%95%E8%BD%A6%26pg%3dwebSearchList%26_sug_%3dn%26_sug_type_%3d%26"
    driver.set_window_size(1200, 800)
    cookies = info['cookies']

    #處理cookies
    driver.get(url)
    for k,v in cookies.iteritems():
        cookie_dict ={'name' 
:k,'value':v}
        driver.add_cookie(cookie_dict)
    driver.get(url)

    #獲取截圖
    driver.get_screenshot_as_file('CrawlResult/screenshot.png')

    #獲取指定元素位置
    element = driver.find_element_by_id('seccodeImage')
    left = int(element.location['x'])
    top = int(element.location['y'])
    right = int(element.location['x'] + element.size['width'])
    bottom = int(element.location['y'] + element.size['height'])

    #通過Image處理影象
    im = Image.open('CrawlResult/screenshot.png')
    im = im.crop((left, top, right, bottom))
    im.save('CrawlResult/code.png')

到這裡，我們的驗證碼就拿下來啦，怎麼處理呢？

1.pytesser，tesseract，OCR 等庫處理

2.驗證碼不多，併為了提高識別效率和簡化操作，我採用了呼叫打碼平臺（ruokuai）API方法，價格大概是1塊錢打100-150個（根據驗證碼位數和是否數字/字母混合）

下面分析一下怎樣使用打碼平臺：

2.官方的呼叫方法（有兩種：DOS版和普通版，下面貼的普通版，基本原理一樣）

原理：將驗證碼圖片，打碼平臺賬號，密碼等按照指定格式呼叫API（訪問URL）

class RClient(object):

    def __init__(self, username, password, soft_id, soft_key):
        self.username = username
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.soft_key = soft_key
        self.base_params = {
            'username': self.username,
            'password': self.password,
            'softid': self.soft_id,
            'softkey': self.soft_key,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'Expect': '100-continue',
            'User-Agent': 'ben',
        }

    def rk_create(self, im, im_type, timeout=60):
        """
        im: 圖片位元組
        im_type: 題目型別
        """
        params = {
            'typeid': im_type,
            'timeout': timeout,
        }
        params.update(self.base_params)
        files = {'image': ('a.jpg', im)}
        r = requests.post('http://api.ruokuai.com/create.json', data=params, files=files, headers=self.headers)
        return r.json()

    def rk_report_error(self, im_id):
        """
        im_id:報錯題目的ID
        """
        params = {
            'id': im_id,
        }
        params.update(self.base_params)
        r = requests.post('http://api.ruokuai.com/reporterror.json', data=params, headers=self.headers)
        return r.json()

rc = RClient('username', 'password', 'soft_id', 'soft_key')
imagePath = 'CrawlResult/code.png'
im = open(imagePath, 'rb').read()
code_json = rc.rk_create(im, '驗證碼型別')# 型別和價格介紹http://www.ruokuai.com/home/pricetype

利用selenium實現獲取驗證碼

利用selenium實現獲取驗證碼

Servlet經典小功能-利用Session實現一次性驗證碼

Unity利用Mob實現簡訊驗證碼

藉助CountDownTimer類實現獲取驗證碼倒計時按鈕

如何用JavaScript實現獲取驗證碼的效果

node.js利用captchapng模塊實現圖片驗證碼

Python Selenium Cookie 繞過驗證碼實現登錄

用countdowntimer實現60秒倒計時獲取驗證碼

一步一步實現web程式資訊管理系統之三----登陸業務邏輯實現（驗證碼功能+引數獲取）

利用Python識別圖形驗證碼！實現自動登入！室友驚訝的合不攏嘴！

利用PHP 簡單實現加減法驗證碼

selenium 自動截圖獲取驗證碼

jquery 獲取驗證碼倒計時

C#實現登陸驗證碼圖片的動態生成

利用Selenium實現圖片文件上傳的兩種方式介紹

js倒計時60秒獲取驗證碼

JavaMail實現郵箱驗證碼

Python+selenium之獲取驗證信息

實現隨機驗證碼

使用js實現網頁驗證碼

利用selenium實現獲取驗證碼

相關推薦