URL地址編碼和解碼
阿新 • • 發佈:2017-06-13
解碼 pen nsis query n) function 關於 written per
0. 參考
【整理】關於http(GET或POST)請求中的url地址的編碼(encode)和解碼(decode)
python3中的urlopen對於中文url是如何處理的?
中文URL的編碼問題
1. rfc1738
2.1. The main parts of URLs A full BNF description of the URL syntax is given in Section 5. In general, URLs are written as follows: <scheme>:<scheme-specific-part> A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme. Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
註意字母不區分大小寫
2. python2
2.1
1 >>> import urllib 2 >>> url = ‘http://web page.com‘ 3 >>> url_en = urllib.quote(url) #空格編碼為“%20” 4 >>> url_plus = urllib.quote_plus(url) #空格編碼為“+” 5 >>> url_en_twice = urllib.quote(url_en) 6 >>> url 7 ‘http://web page.com‘ 8 >>> url_en 9 ‘http%3A//web%20page.com‘ 10 >>> url_plus 11 ‘http%3A%2F%2Fweb+page.com‘ 12 >>> url_en_twice 13 ‘http%253A//web%2520page.com‘ #出現%25說明是二次編碼 14 #相應解碼 15 >>> urllib.unquote(url_en) 16 ‘http://web page.com‘ 17 >>> urllib.unquote_plus(url_plus) 18‘http://web page.com‘
2.2 URL含有中文
1 >>> import urllib 2 >>> url_zh = u‘http://movie.douban.com/tag/美國‘ 3 >>> url_zh_en = urllib.quote(url_zh.encode(‘utf-8‘)) #參數為string 4 >>> url_zh_en 5 ‘http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD‘ 6 >>> print urllib.unquote(url_zh_en).decode(‘utf-8‘) 7 http://movie.douban.com/tag/美國
3. python3
3.1
1 >>> import urllib 2 >>> url = ‘http://web page.com‘ 3 >>> url_en = urllib.parse.quote(url) #註意是urllib.parse.quote 4 >>> url_plus = urllib.parse.quote_plus(url) 5 >>> url_en 6 ‘http%3A//web%20page.com‘ 7 >>> url_plus 8 ‘http%3A%2F%2Fweb+page.com‘ 9 >>> urllib.parse.unquote(url_en) 10 ‘http://web page.com‘ 11 >>> urllib.parse.unquote_plus(url_plus) 12 ‘http://web page.com‘
3.2 URl含中文
1 >>> import urllib 2 >>> url_zh = ‘http://movie.douban.com/tag/美國‘ 3 >>> url_zh_en = urllib.parse.quote(url_zh) 4 >>> url_zh_en 5 ‘http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD‘ 6 >>> urllib.parse.unquote(url_zh_en) 7 ‘http://movie.douban.com/tag/美國‘
4. 其他
1 >>> help(urllib.urlencode) 2 Help on function urlencode in module urllib: 3 4 urlencode(query, doseq=0) 5 Encode a sequence of two-element tuples or dictionary into a URL query string. 6 7 If any values in the query arg are sequences and doseq is true, each 8 sequence element is converted to a separate parameter. 9 10 If the query arg is a sequence of two-element tuples, the order of the 11 parameters in the output will match the order of parameters in the 12 input. 13 14 >>>
URL地址編碼和解碼