scrapy爬蟲儲存為csv檔案的技術分析
阿新 • • 發佈:2019-02-10
由於工作需要,將爬蟲的檔案要儲存為csv,以前只是儲存為json,但是目前網上很多方法都行不通,主要有一下兩種:
第二種:from scrapy import signals from scrapy.contrib.exporter import CsvItemExporter class CSVPipeline(object): def __init__(self): self.files = {} @classmethod def from_crawler(cls, crawler): pipeline = cls() crawler.signals.connect(pipeline.spider_opened, signals.spider_opened) crawler.signals.connect(pipeline.spider_closed, signals.spider_closed) return pipeline def spider_opened(self, spider): file = open('%s_items.csv' % spider.name, 'w+b') self.files[spider] = file self.exporter = CsvItemExporter(file) self.exporter.fields_to_export = [list with Names of fields to export - order is important] self.exporter.start_exporting() def spider_closed(self, spider): self.exporter.finish_exporting() file = self.files.pop(spider) file.close() def process_item(self, item, spider): self.exporter.export_item(item) return item
import csv import itertools class CSVPipeline(object): def __init__(self): self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',') self.csvwriter.writerow(['names','starts','subjects','reviews']) def process_item(self, item, ampa): rows = zip(item['names'],item['stars'],item['subjects'],item['reviews']) for row in rows: self.csvwriter.writerow(row) return item
結果行不通,無法儲存。後來經過研究發現,無法儲存的根本原因在於爬蟲得到的資料格式和儲存檔案的格式不一樣,修改格式後,儲存成功,如有需要,請扣扣聯絡:1241296318
儲存以後直接用excel開啟是亂碼
用其他工具editplus開啟,另存為bom編碼格式
再次開啟,則檔案成功