1. 程式人生 > >爬蟲(一)——用Requests模組獲取網頁資訊

爬蟲(一)——用Requests模組獲取網頁資訊

呼叫requests庫裡面的get方法,獲取網頁的資訊,呼叫page.text獲取網頁原始碼,然後通過print打印出網頁原始碼
import requests
page = requests.get('https://blog.csdn.net/zt_0910/article/details/80075742')
text = page.text
print(text.encode("utf-8"))
import requests
import re

head = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'
} TimeOut = 30 def requestpageText(url): try: Page = requests.session().get(url, headers=head, timeout=TimeOut) Page.encoding = "gb2312" return Page.text except BaseException as e: print("聯網失敗了...", e) site = "http://www.meizitu.com/a/qingchun_3_1.html" text = requestpageText(site) # 抓取網頁原始碼
patterns = re.compile(r'http:.*?/\d*?.html') # 匹配需要的資料 istp = re.findall(patterns, text) for photo in istp: print(photo)