1. 程式人生 > >Python爬蟲-爬取伯樂在線美女郵箱

Python爬蟲-爬取伯樂在線美女郵箱

login date json src x86_64 turn connect pre ror

爬取伯樂在線美女郵箱

1.登錄界面的進入,設置url,cookie,data,headers

2.進入主頁,點擊郵箱鏈接,需要重新設置url,cookie(讀取重新保存的cookie),data,headers

技術分享圖片

技術分享圖片

  1 ‘‘‘
  2 爬取伯樂在線的美女聯系方式
  3 需要:
  4 1. 登錄
  5 2. 在登錄和相應聲望值的前提下,提取對方的郵箱
  6 ‘‘‘
  7 
  8 from urllib import request, error, parse
  9 from http import cookiejar
 10 import json
 11 
 12 def
login(): 13 ‘‘‘ 14 輸入用戶名稱和密碼 15 獲取相應的登錄cookie 16 cookie 寫文件 17 :return: 18 ‘‘‘ 19 20 # 1. 需要找到登錄入口 21 url = "http://date.jobbole.com/wp-login.php" 22 23 # 2. 準備登錄數據 24 data = { 25 "log": "augsnano", 26 "pwd": "123456789", 27 #
登陸後重定向地址 28 "redirect_to": "http://date.jobbole.com/4965/", 29 "rememberme": "on" 30 } 31 32 data = parse.urlencode(data).encode() 33 34 35 # 3. 準備存放cookie文件 36 # r表示不轉義 37 f = rjobbole_cookie.txt 38 39 # 4. 準備請求頭信息 40 headers = { 41 "
User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36", 42 "Connection": "keep-alive" 43 44 } 45 46 # 5. 準備cookie hanlder 47 cookie_handler = cookiejar.MozillaCookieJar(f) 48 49 # 6. 準備http請求handler 50 http_handler = request.HTTPCookieProcessor(cookie_handler) 51 52 53 # 7. 構建opener 54 opener = request.build_opener(http_handler) 55 56 # 8. 構建請求對象 57 req = request.Request(url, data=data, headers=headers) 58 59 # 9. 發送請求 60 try: 61 rsp = opener.open(req) 62 63 cookie_handler.save(f, ignore_discard=True, ignore_expires=True) 64 65 html = rsp.read().decode() 66 print(html) 67 except error.URLError as e: 68 print(e) 69 70 71 def getInfo(): 72 # 1. 確定url 73 url = "http://date.jobbole.com/wp-admin/admin-ajax.php" 74 75 # 2. 讀取已經保存的cookie 76 f = rjobbole_cookie.txt 77 cookie = cookiejar.MozillaCookieJar() 78 cookie.load(f, ignore_expires=True, ignore_discard=True) 79 80 # 3. 構建http_handler 81 http_handler = request.HTTPCookieProcessor(cookie) 82 83 # 4. 構建opener 84 opener = request.build_opener(http_handler) 85 86 # 以下是準備請求對象的過程 87 88 # 5. 構建data 89 data = { 90 "action": "get_date_contact", 91 "postId": "4965" 92 } 93 94 data = parse.urlencode(data).encode() 95 96 # 6. 構建請求頭 97 headers = { 98 "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36", 99 "Connection": "keep-alive" 100 } 101 102 # 7. 構建請求實體 103 req = request.Request(url, data=data, headers=headers) 104 105 # 8. 用opener打開 106 try: 107 rsp = opener.open(req) 108 html = rsp.read().decode() 109 110 html = json.loads(html) 111 print(html) 112 113 f = "rsp.html" 114 with open(f, w) as f: 115 f.write(html) 116 117 except Exception as e: 118 print(e) 119 120 121 122 123 124 125 if __name__ == __main__: 126 getInfo()

Python爬蟲-爬取伯樂在線美女郵箱