1. 程式人生 > >python獲取網站http://www.weather.com.cn 城市 8-15天天氣

python獲取網站http://www.weather.com.cn 城市 8-15天天氣

status header none esp user lis [1] bad reat

參考一個前輩的代碼,修改了一個案例開始學習beautifulsoup做爬蟲獲取天氣信息,前輩獲取的是7日內天氣,

我看旁邊還有8-15日就模仿修改了下。其實其他都沒有變化,只變換了獲取標簽的部分。但是我碰到

一個span獲取的問題,如我的案例中每日的源代碼是這樣的。

<li class="t">
<span class="time">周五(19日)</span>
<big class="png30 d301"></big>
<big class="png30 n301"></big>
<span class
="wea">雨</span> <span class="tem"><em>36℃</em>/22℃</span> <span class="wind">東南風</span> <span class="wind1">微風</span> </li>

上門的所有span標簽中,日期,天氣,風向都可以通過beautifulsoup進行標簽匹配獲取。唯獨溫度獲取不到,

獲取到的值為none,我奇怪了好酒,用span.em能獲取到36°,獲取不完全,不符合我的要求。最後沒辦法。

我只能通過獲取到這個span這一回內容

<span class="tem"><em>36℃</em>/22℃</span>

然後通過字符串替換替換掉多余的字符。剩余36℃/22℃

得到這個結果。存入變量並寫入csv文件。

以下為全部代碼,如有不對的地方歡迎指教。

‘‘‘
Created on 2017年5月10日

@author: bekey qq:402151718
‘‘‘

#conding:UTF-8

import requests
import csv
import random
import time
import socket import http.client #import urllib.request from bs4 import BeautifulSoup def get_content(url , data = None): header={ Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8, Accept-Encoding: gzip, deflate, sdch, Accept-Language: zh-CN,zh;q=0.8, Connection: keep-alive, User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 } timeout = random.choice(range(80, 180)) while True: try: rep = requests.get(url,headers = header,timeout = timeout) rep.encoding = utf-8 # req = urllib.request.Request(url, data, header) # response = urllib.request.urlopen(req, timeout=timeout) # html1 = response.read().decode(‘UTF-8‘, errors=‘ignore‘) # response.close() break # except urllib.request.HTTPError as e: # print( ‘1:‘, e) # time.sleep(random.choice(range(5, 10))) # # except urllib.request.URLError as e: # print( ‘2:‘, e) # time.sleep(random.choice(range(5, 10))) except socket.timeout as e: print( 3:, e) time.sleep(random.choice(range(8,15))) except socket.error as e: print( 4:, e) time.sleep(random.choice(range(20, 60))) except http.client.BadStatusLine as e: print( 5:, e) time.sleep(random.choice(range(30, 80))) except http.client.IncompleteRead as e: print( 6:, e) time.sleep(random.choice(range(5, 15))) return rep.text # return html_text def get_data(html_text): final = [] bs = BeautifulSoup(html_text, "html.parser") # 創建BeautifulSoup對象 body = bs.body # 獲取body部分 data = body.find(div, {id: 15d}) # 找到id為7d的div ul = data.find(ul) # 獲取ul部分 li = ul.find_all(li) # 獲取所有的li for day in li: # 對每個li標簽中的內容進行遍歷 temp = [] #print(day) span = day.find_all(span) #找到所有的span標簽 #print(span) date = span[0].string # 找到日期 temp.append(date) # 添加到temp中 wea1 = span[1].string#獲取天氣情況 temp.append(wea1) #加入到list tem =str(span[2]) tem = tem.replace(<span class="tem"><em>, ‘‘) tem = tem.replace(</span>,‘‘) tem = tem.replace(</em>,‘‘) #tem = tem.find(‘span‘).string #獲取溫度 temp.append(tem) #溫度加入list windy = span[3].string temp.append(windy)#加入到list windy1 = span[4].string temp.append(windy1)#加入到list final.append(temp) return final def write_data(data, name): file_name = name with open(file_name, a, errors=ignore, newline=‘‘) as f: f_csv = csv.writer(f) f_csv.writerows(data) if __name__ == __main__: url =http://www.weather.com.cn/weather15d/101180101.shtml html = get_content(url) #print(html) result = get_data(html) #print(result) write_data(result, weather7.csv)

效果如圖:

技術分享

項目地址:[email protected]:zhangbei59/weather_get.git

python獲取網站http://www.weather.com.cn 城市 8-15天天氣