1. 程式人生 > >Scrapy通過sqlite3保存數據

Scrapy通過sqlite3保存數據

光標 AI mat items trac lds .cn logs utf

以爬取當當網作為實例 http://bj.ganji.com/fang1/chaoyang/

通過xpath獲取title和price

分別貼出spider, items, pipelines的code

 1 # -*- coding: utf-8 -*-
 2 import scrapy
 3 from ..items import RenthouseItem
 4 
 5 class GanjiSpider(scrapy.Spider):
 6     name = ganji
 7     # allowed_domains = [‘bj.ganji.com‘]
 8     start_urls = [
http://bj.ganji.com/fang1/chaoyang/] 9 10 def parse(self, response): 11 #print(response) 12 rh = RenthouseItem() 13 title_list = response.xpath(//*[@class="f-list-item ershoufang-list"]/dl/dd[1]/a/text()).extract() 14 price_list = response.xpath(//*[@class="f-list-item ershoufang-list"]/dl/dd[5]/div[1]/span[1]/text()
).extract() 15 # d = {} 16 for i, j in zip(title_list, price_list): 17 rh[title] = i 18 rh[price] = j 19 yield rh 20 # d[‘title‘] = i 21 # d[‘price‘] = j 22 # yield d 23 # print(i, ‘:‘, j)
 1 # -*- coding: utf-8 -*-
2 3 # Define here the models for your scraped items 4 # 5 # See documentation in: 6 # https://doc.scrapy.org/en/latest/topics/items.html 7 8 import scrapy 9 10 11 class RenthouseItem(scrapy.Item): 12 # define the fields for your item here like: 13 # name = scrapy.Field() 14 title = scrapy.Field() 15 price = scrapy.Field() 16 # pass
 1 # -*- coding: utf-8 -*-
 2 
 3 # Define your item pipelines here
 4 #
 5 # Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
 6 # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
 7 import sqlite3
 8 
 9 class RenthousePipeline(object):
10     def open_spider(self, spider):
11         self.con = sqlite3.connect(renthouse.sqlite)
12         self.cu = self.con.cursor()    
13 
14     def process_item(self, item, spider):
15         #print(spider.name)
16         insert_sql = insert into renthouse (title, price) values ("{}", "{}").format(item[title], item[price])
17         #print(insert_sql)
18         self.cu.execute(insert_sql)
19         self.con.commit()
20         return item
21 
22     def spider_close(self, spider):
23         self.con.close() 

spider通過 rh = RenthouseItem() 這一句話初始化一個rh的實例,使我們可以通過這個rh傳到pipelines進行處理

所以這裏我們每次通過rh傳一個字典給pipelines(標題titile,價格price)然後通過sql語句插入到sqlite3

open_spider是打開spider的時候做的,所以這個時候我們連接數據庫,個人覺得這篇文章關於cursor光標及sqlite的應用講的很清楚https://www.cnblogs.com/qq78292959/archive/2013/04/01/2993327.html

註意insert等這種修改數據execute(執行)以後一定要commit(提交)!!!

close_spider就是關閉spider的時候做的,所以這個時候我們關閉與數據庫的連接

Scrapy通過sqlite3保存數據