python爬取銀行名稱和官網地址

阿新 • • 發佈：2018-10-09

... set 是我 har url 分享 fin 網站 margin

爬取所有銀行的銀行名稱和官網地址(如果沒有官網就忽略)，並寫入數據庫。
目標網址：http://www.cbrc.gov.cn/chinese/jrjg/index.html
（因為此網站做了反爬蟲機制，所以這裏需要我們將爬蟲偽裝瀏覽器進行訪問。）
關於爬蟲偽裝成瀏覽器訪問可以參考這篇文章：
https://blog.csdn.net/a877415861/article/details/79468878

話不多說直接上代碼：

import re
from urllib import request
from urllib.request import urlopen
import pymysql as mysql

u = ‘root‘
p = ‘root‘
d = ‘python‘
sql = ‘insert into bank_info values(%s,%s)‘

url = ‘http://www.cbrc.gov.cn/chinese/jrjg/index.html‘

# 爬蟲偽裝瀏覽器步驟：

# 1. 定義一個真實瀏覽器的代理名稱
myAgent = "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"    #這個是我當前火狐瀏覽器的信息

# 2.將代理寫到請求頁面的header裏面去
myrequest = request.Request(url,headers={‘User-Agent‘: myAgent} )

# 3. 打開網頁， 獲取內容
content = urlopen(myrequest).read().decode(‘utf-8‘)

# 獲取對象：<a href="http://www.icbc.com.cn/" target="_blank" style="color:#08619D">中國工商銀行</a>
pattern = r‘<li style="margin.*inline;">\s*<a href="(http://.+?)" target="_blank" style="color:#08619D">\s*?([\S]*?)\s*?</a>|<li style="margin.*inline;">\s*?([\S]*?)\s*?</li>‘

def main():
    res = re.findall(pattern, content)
    # [(‘http://www.hsbc.com.cn‘, ‘匯豐中國‘, ‘‘), ...(‘‘, ‘‘, ‘蒙特利爾銀行（中國）有限公司‘)...]
    conn = mysql.connect(user=u, passwd=p, db=d, charset=‘utf8‘, autocommit=True)
    cur = conn.cursor()
    for info in res:
        if info[0]:
            info = info[1::-1]    # 有官網
        else:
            info = info[:-3:-1]    # 無官網
        cur.execute(sql, (info[0],info[1]))
        conn.commit()

if __name__ == "__main__":
    main()

運行結果：
技術分享圖片

python爬取銀行名稱和官網地址

... set 是我 har url 分享 fin 網站 margin 爬取所有銀行的銀行名稱和官網地址(如果沒有官網就忽略)，並寫入數據庫。目標網址：http://www.cbrc.gov.cn/chinese/jrjg/index.html（因為此網站做了反爬蟲機制，所

python爬取銀行名稱和官網地址

python爬取銀行名稱和官網地址

Python 爬取學校課程表和成績

利用Python爬取YouTube上的視訊播放地址

Python爬取王者榮耀官網，實現一對一下載軟件！

Python-爬取校花網視訊(單執行緒和多執行緒版本)

python爬取網易雲音樂歌單音樂

Python 爬取淘寶商品信息和相應價格

我用Python爬取網易雲音樂上的Hip-hop歌單，分析rapper如何押韻

Python爬取天氣網歷史天氣數據

Python爬取全書網小說，免費看小說

Python爬蟲案例：利用Python爬取笑話網

沒有內涵段子可以刷了，利用Python爬取段友之家貼吧圖片和小視頻(含源碼)

python爬取網易雲音樂歌曲評論信息

Python爬取淘寶店鋪和評論

利用高德API + Python爬取鏈家網租房資訊 01

Python爬取B站彈幕的思路和流程

Python爬取千圖網PS素材圖片

用python爬取拉勾網招聘資訊並以CSV檔案儲存

學習了一個月python，進行實戰一下：爬取文章標題和正文並儲存的程式碼

Python-爬取妹子圖(單執行緒和多執行緒版本)

python爬取銀行名稱和官網地址

相關推薦