1. 程式人生 > >用Python實現應用Last-Modified和ETag避免下載重複內容

用Python實現應用Last-Modified和ETag避免下載重複內容

Http 1.1中避免重複下載的標記

使用Http1.1中定義好的頭資訊來避免重複下載,參考HTTP/1.1 Section 14 Header Field Definitions中的14.19 ETag/14.24 If-Match/14.29 Last-Modified/14.25 If-Modified-Since

開發者把Last-Modified 和ETags請求的http報頭一起使用,能夠有效利用本地快取,降低無謂的重複下載。

示例程式碼邏輯

1. 客戶端下載一個連結(Sample);
2. 伺服器返回Sample,Sample中記錄Last-Modified/ETag標記;
3. 客戶端再次下載這個連結,並將上次請求時伺服器返回的Last-Modified/ETag一起傳遞給伺服器;
4. 伺服器檢查該Last-Modified或ETag,並判斷出該頁面自上次客戶端請求之後還未被修改,直接返回響應304和一個空的響應體。

其實在《Dive Into Python》中就有相當詳細的例項程式碼,強烈建議沒看過這本書的python程式設計師們認真學習一下,會提升面向物件程式設計和網路程式設計能力的。

示例程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
#!/usr/bin/python
# -*- coding: utf-8 -*-
'''
Created on Nov 9, 2011
 
@author: li3huo
'
''   import urllib, urllib2 import sys import time class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler): """用來保證請求中記錄Http狀態 """ def http_error_default(self, req, fp, code, msg, headers): result = urllib2.HTTPError( req.get_full_url(), code, msg, headers, fp) result.status = code return
result   class Sample(): """a sample is the url i want to download """ url = None contentLength = 0 etag = None lastModified = None data = None path = None   def __init__(self, url, contentLength=0, etag=None, lastModified=None): self.url = url self.contentLength = 0 self.etag = etag self.lastModified = lastModified self.status = 200 self.file = file def __repr__(self): return repr("Http Status=%d; Length=%d; Last Modified Time=%s; eTag=%s" % (self.status, self.contentLength, self.lastModified, self.etag))   def downloadSample(self): request = urllib2.Request(self.url) request.add_header('User-Agent', "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en)") if self.lastModified: request.add_header('If-Modified-Since', self.lastModified) if self.etag: request.add_header('If-None-Match', self.etag) conn = urllib2.build_opener(DefaultErrorHandler()).open(request)     if hasattr(conn, 'headers'): # save ETag, if the server sent one self.etag = conn.headers.get('ETag') # save Last-Modified header, if the server sent one self.lastModified = conn.headers.get('Last-Modified')   self.contentLength = conn.headers.get("content-length")   if hasattr(conn, 'status'): self.status = conn.status print "status=%d" % self.status   self.data = conn.read()   if self.status == 304: print "the content is same, so return nothing!"   if not self.contentLength: self.contentLength = len(self.data)   conn.close()   if __name__ == '__main__': url = 'http://www.sina.com.cn' sample = Sample(url) sample.downloadSample() print sample sample.downloadSample() print sample

輸出結果

‘Http Status=200; Length=589988; Last Modified Time=Wed, 09 Nov 2011 10:45:55 GMT; eTag=None’
status=304
the content is same, so return nothing!
‘Http Status=304; Length=0; Last Modified Time=Wed, 09 Nov 2011 10:45:55 GMT; eTag=None’