1. 程式人生 > >2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二 天氣預報

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二 天氣預報

font size 項目 執行 weather html time art show

1.項目準備:網站地址:http://quanzhou.tianqi.com/

技術分享

2.創建編輯Scrapy爬蟲:

scrapy startproject weather

scrapy genspider HQUSpider quanzhou.tianqi.com

技術分享

項目文件結構如圖:

技術分享

3.修改Items.py:

技術分享

4.修改Spider文件HQUSpider.py:

(1)先使用命令:scrapy shell http://quanzhou.tianqi.com/ 測試和獲取選擇器:

技術分享

(2)試驗選擇器:打開chrome瀏覽器,查看網頁源代碼:

技術分享

(3)執行命令查看response結果:

技術分享

(4)編寫HQUSpider.py文件:

# -*- coding: utf-8 -*-
import scrapy
from weather.items import WeatherItem

class HquspiderSpider(scrapy.Spider):
name = ‘HQUSpider‘
allowed_domains = [‘tianqi.com‘]
citys=[‘quanzhou‘,‘datong‘]
start_urls = []
for city in citys:
start_urls.append(‘http://‘+city+‘.tianqi.com/‘)
def parse(self, response):
subSelector=response.xpath(‘//div[@class="tqshow1"]‘)
items=[]
for sub in subSelector:
item=WeatherItem()
cityDates=‘‘
for cityDate in sub.xpath(‘./h3//text()‘).extract():
cityDates+=cityDate
item[‘cityDate‘]=cityDates
item[‘week‘]=sub.xpath(‘./p//text()‘).extract()[0]
item[‘img‘]=sub.xpath(‘./ul/li[1][email protected]
/* */).extract()[0]
temps=‘‘
for temp in sub.xpath(‘./ul/li[2]//text()‘).extract():
temps+=temp
item[‘temperature‘]=temps
item[‘weather‘]=sub.xpath(‘./ul/li[3]//text()‘).extract()[0]
item[‘wind‘]=sub.xpath(‘./ul/li[4]//text()‘).extract()[0]
items.append(item)
return items


(5)修改pipelines.py我,處理Spider的結果:
# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import time
import os.path
import urllib2
import sys
reload(sys)
sys.setdefaultencoding(‘utf8‘)

class WeatherPipeline(object):
def process_item(self, item, spider):
today=time.strftime(‘%Y%m%d‘,time.localtime())
fileName=today+‘.txt‘
with open(fileName,‘a‘) as fp:
fp.write(item[‘cityDate‘].encode(‘utf-8‘)+‘\t‘)
fp.write(item[‘week‘].encode(‘utf-8‘)+‘\t‘)
imgName=os.path.basename(item[‘img‘])
fp.write(imgName+‘\t‘)
if os.path.exists(imgName):
pass
else:
with open(imgName,‘wb‘) as fp:
response=urllib2.urlopen(item[‘img‘])
fp.write(response.read())
fp.write(item[‘temperature‘].encode(‘utf-8‘)+‘\t‘)
fp.write(item[‘weather‘].encode(‘utf-8‘)+‘\t‘)
fp.write(item[‘wind‘].encode(‘utf-8‘)+‘\n\n‘)
time.sleep(1)
return item


技術分享

(6)修改settings.py文件,決定由哪個文件來處理獲取的數據:

技術分享

(7)執行命令:scrapy crawl HQUSpider

技術分享

到此為止,一個完整的Scrapy爬蟲就完成了。

















2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二 天氣預報