Python爬蟲學習,抓取網頁上的天氣資訊
阿新 • • 發佈:2019-02-14
今天學習了使用python編寫爬蟲程式,從中國天氣網爬取杭州的天氣。使用到了urllib庫和bs4。bs4提供了專門針對html的解析功能,比用RE方便許多。
# coding : UTF-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
from bs4 import BeautifulSoup
import csv
import urllib
def get_html(url):
html = urllib.urlopen(url)
return html.read()
def get_data(html_text) :
final = []
bs = BeautifulSoup(html_text, "html.parser")
body = bs.body
data = body.find('div', {'id': '7d'})
ul = data.find('ul')
li = ul.find_all('li')
for day in li:
temp = []
date = day.find('h1').string
temp.append(date)
inf = day.find_all('p' )
temp.append(inf[0].string,)
if inf[1].find('span') is None:
temperature_highest = None
else:
temperature_highest = inf[1].find('span').string
temperature_highest = temperature_highest.replace('C', '')
temperature_lowest = inf[1].find('i' ).string
temperature_lowest = temperature_lowest.replace('C', '')
temp.append(temperature_highest)
temp.append(temperature_lowest)
final.append(temp)
return final
def write_data(data, name):
file_name = name
with open(file_name, 'a') as f:
f_csv = csv.writer(f)
f_csv.writerows(data)
if __name__ == '__main__':
html_doc = get_html('http://www.weather.com.cn/weather/101190401.shtml')
result = get_data(html_doc)
write_data(result, 'weather.csv')
print result
執行結果儲存在csv檔案中,如下:
28日(今天),小雨,,13℃
29日(明天),小雨轉陰,15℃,12℃
30日(後天),多雲,19℃,14℃
31日(週一),小雨,16℃,14℃
1日(週二),陰轉多雲,16℃,10℃
2日(週三),多雲轉晴,17℃,10℃
3日(週四),多雲轉晴,18℃,11℃