1. 程式人生 > >URL地址編碼和解碼

URL地址編碼和解碼

解碼 pen nsis query n) function 關於 written per

0. 參考

【整理】關於http(GET或POST)請求中的url地址的編碼(encode)和解碼(decode)

python3中的urlopen對於中文url是如何處理的?

中文URL的編碼問題

1. rfc1738

2.1. The main parts of URLs

   A full BNF description of the URL syntax is given in Section 5.

   In general, URLs are written as follows:

       <scheme>:<scheme-specific-part>

   A URL contains the name of the scheme being used (<scheme>) followed
   by a colon and then a string (the <scheme-specific-part>) whose
   interpretation depends on the scheme.

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed. For resiliency, programs
   interpreting URLs should treat upper case letters as equivalent to
   lower case in scheme names (e.g., allow "HTTP" as well as "http").

註意字母不區分大小寫

2. python2

2.1

 1 >>> import urllib
 2 >>> url = http://web page.com
 3 >>> url_en = urllib.quote(url)    #空格編碼為“%20”
 4 >>> url_plus = urllib.quote_plus(url)    #空格編碼為“+”
 5 >>> url_en_twice = urllib.quote(url_en)
 6 >>> url
 7 http://web page.com
8 >>> url_en 9 http%3A//web%20page.com 10 >>> url_plus 11 http%3A%2F%2Fweb+page.com 12 >>> url_en_twice 13 http%253A//web%2520page.com #出現%25說明是二次編碼 14 #相應解碼 15 >>> urllib.unquote(url_en) 16 http://web page.com 17 >>> urllib.unquote_plus(url_plus) 18
http://web page.com

2.2 URL含有中文

1 >>> import urllib
2 >>> url_zh = uhttp://movie.douban.com/tag/美國
3 >>> url_zh_en = urllib.quote(url_zh.encode(utf-8))    #參數為string
4 >>> url_zh_en
5 http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD
6 >>> print urllib.unquote(url_zh_en).decode(utf-8)
7 http://movie.douban.com/tag/美國

3. python3

3.1

 1 >>> import urllib
 2 >>> url = http://web page.com
 3 >>> url_en = urllib.parse.quote(url)    #註意是urllib.parse.quote
 4 >>> url_plus = urllib.parse.quote_plus(url)
 5 >>> url_en
 6 http%3A//web%20page.com
 7 >>> url_plus
 8 http%3A%2F%2Fweb+page.com
 9 >>> urllib.parse.unquote(url_en)
10 http://web page.com
11 >>> urllib.parse.unquote_plus(url_plus)
12 http://web page.com

3.2 URl含中文

1 >>> import urllib
2 >>> url_zh = http://movie.douban.com/tag/美國
3 >>> url_zh_en = urllib.parse.quote(url_zh)
4 >>> url_zh_en
5 http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD
6 >>> urllib.parse.unquote(url_zh_en)
7 http://movie.douban.com/tag/美國

4. 其他

 1 >>> help(urllib.urlencode)
 2 Help on function urlencode in module urllib:
 3 
 4 urlencode(query, doseq=0)
 5     Encode a sequence of two-element tuples or dictionary into a URL query string.
 6 
 7     If any values in the query arg are sequences and doseq is true, each
 8     sequence element is converted to a separate parameter.
 9 
10     If the query arg is a sequence of two-element tuples, the order of the
11     parameters in the output will match the order of parameters in the
12     input.
13 
14 >>>

URL地址編碼和解碼