1. 程式人生 > >requests爬取貓眼電影403錯誤解決方法

requests爬取貓眼電影403錯誤解決方法

原始碼如下: 

import requests
from requests.exceptions import RequestException


def one_page_code(url):
    try:
        page = requests.get(url)
        if page.status_code == 200:
            return page.text
        print("Failed\n狀態碼為%d"%(page.status_code))
    except RequestException:
        print("Exception")

def main():
    url = 'http://maoyan.com'
    print(one_page_code(url))

if __name__ == '__main__':
    main()

這個程式碼無論是請求百度、淘寶還是豆瓣都能正常的顯示出網頁原始碼,但是在爬取貓眼時卻返回403錯誤

 

原來請求網頁的過程中,忽略了很重要的一點,就是請求頭

我們在瀏覽器檢查元素中把network中的請求頭複製出來,新增到請求函式中

import requests
from requests.exceptions import RequestException


def one_page_code(url):
    try:
        header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'}
        page = requests.get(url,headers = header)
        if page.status_code == 200:
            return page.text
        print("Failed\n狀態碼為%d"%(page.status_code))
    except RequestException:
        print("Exception")

def main():
    url = 'http://maoyan.com/board/4'
    print(one_page_code(url))

if __name__ == '__main__':
    main()

就可以正常獲取到網頁的原始碼了