1. 程式人生 > >python---get/post請求下載指定URL返回的網頁內容,出現gzip亂碼處理。設定Accept-Encoding為gzip,deflate,返回的網頁是亂碼

python---get/post請求下載指定URL返回的網頁內容,出現gzip亂碼處理。設定Accept-Encoding為gzip,deflate,返回的網頁是亂碼

python—get/post請求下載指定URL返回的網頁內容,出現gzip亂碼處理。設定Accept-Encoding為gzip,deflate,返回的網頁是亂碼

1、指令碼

# --*-- coding:utf-8 --*--
#coding:utf-8

import string
import urllib
import urllib2
import ssl

def getpicyanzhengma():#實時請求伺服器最新的驗證碼,並儲存pic.png圖片格式,與伺服器互動
    urlget = "https://xianzhi.aliyun.com/forum/topic/1805/"
    #ctl = {"ctl":"code"}
#ctldata = urllib.urlencode(ctl) #reqget = urllib2.Request(urlget+'?'+ctldata)#構造get請求與引數 reqget = urllib2.Request(urlget)#構造get請求與引數 #新增get請求的頭資訊 reqget.add_header("Host","xianzhi.aliyun.com") reqget.add_header("Cache-Control","max-age=0") reqget.add_header("Upgrade-Insecure-Requests"
,"1") reqget.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36") reqget.add_header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8") reqget.add_header("Accept-Language","zh-CN,zh;q=0.8"
) reqget.add_header("Accept-Encoding","gzip, deflate, sdch, br") reqget.add_header("Cookie","cnz=X6ejEtcUBVMCAfJ77XgdkdPH; cna=YKejEpKOFU0CAXjte/LuiuWB; UM_distinctid=16000343ca4183-0e8093cc5e7b3-64191279-15f900-16000343ca575a; _uab_collina=151183659981086744617448; _ga=GA1.2.668866163.1511831906; aliyun_country=CN; aliyun_site=CN; isg=ApmZtNphJydPxfuAkp4Fb9c1qIWzjqX8QOIT1rtOAUA_wrlUA3adqAfSsrFO; _umdata=ED82BDCEC1AA6EB94F984760A4C6465E6DD138CC3777AF0CB131A783FCB0E006227E021A199C6A8DCD43AD3E795C914C3303D9E6CB380052D470743247B79D15; acw_tc=AQAAAJMuFXttQgkA8nvteBqARscCdcug; csrftoken=CkpJbhBYBvg6oTBvrwTrsrYcsF1SJXC4mdv0A0k1BmX6mDFT0K2izVlfJkaZI4zx; CNZZDATA1260716569=1195371503-1511830276-https%253A%252F%252Fwww.baidu.com%252F%7C1515457887") reqget.add_header("Connection","keep-alive") #使用本機進行代理抓包,檢視詳細的資料包 #proxy_handler = urllib2.ProxyHandler({'http': '192.168.40.36:4455'}) #opener = urllib2.build_opener(proxy_handler) #urllib2.install_opener(opener) context = ssl._create_unverified_context()#啟用ssl。如果是http的話此行去除 resget = urllib2.urlopen(reqget,context=context)#在urllib2啟用ssl欄位,開啟請求的資料。如果是http的話此 resgetdata = resget.read() print resgetdata #對get請求的資料回包的圖片驗證碼資料,儲存為pic.png的圖片 f = open("e:/pic/downloadxianzhi.html","wb") f.write(resgetdata) f.close() getpicyanzhengma()

2、執行指令碼發現亂碼

python使用get方式請求頁面時,返回頁面內容是亂碼
這裡寫圖片描述

��<鵶壑�?�3�? �4OQW$礩'鄶蚽移澗懍�(�%+mf鐲謒蓫,_�!踨':\'柆��%@顓�� 奛獫dv9嘟飣魺 x踅脀櫖憮N翎F鏀窿R"�餑�賤r揉!薸:2�##胿�z螑 榗妍+邇嫣N_�;釞琾9��.hR迱T%�猙 鄖鐍�7C氹撴鬲5U礀6瑭菮糰 嶄U蛨�3翦�慏#�/[email protected],鵴JR$C鈊V8�'S98�+浼G閣uG :O#婈�.K��!�?" 槩瑔2龖XF� 箻np�$釀橷�茻Qx�0苃P梤� 姖g蒐洸譟杫1�1*#漚Yz個FZ匴UC74.偄偖G(^T!肶崇\ L$J囉Esb噘縭⒒@Sx擣�7b� ��%醜pa觵@€溼��肏摴褟餚�楚i斀*尲\�4OFy鮸燔_ H�:�=b|e�?�)3Ja礌挘ガ嗶吉枰0jΠ甎麵�0瞾橑辝��<�{�&尞 龖琣鋥c1AQ�&VPs6輑"欻DSd眘€p_孨u颫Hヌ�搒謡w�<�⒊淕瓜q�=鴫>�;�'M�籵淚D� �憅ZU�$撮L靠h溳 絬窶^)6錮I聖]�)
註釋欄位:
#reqget.add_header("Accept-Encoding","gzip, deflate, sdch, br")

執行情況

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">

    <title>先知社群</title>

    <meta name="description" content="先知社群,先知安全技術社群">
    <meta name="viewport"
          content="width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no">

    <link rel="icon" href="/forum/static/icon/favicon.ico" type="image/x-icon">
    <!-- Le styles -->
    ....................
    .....................
    .....................
    ......................

4、解決html回包的亂碼方式:

想要獲得正確網頁內容,而非亂碼的話,就有兩種方式了:

1.不要設定Accept-Encoding的Header

//req.Headers.Add("Accept-Encoding", "gzip,deflate");

2.設定Accept-Encoding的Header,同時設定對應的自動解壓縮的模式

req.Headers["Accept-Encoding"] = "gzip,deflate"; 
req.AutomaticDecompression = DecompressionMethods.GZip;

具體採用哪種方法,自己根據需要選擇。