scrapy 爬取京東例子

阿新 • • 發佈：2019-01-20

#-*- encoding: UTF-8 -*-
#---------------------------------import------------------------------------
import scrapy
import re
from tutorial.items import TutorialItem
from scrapy import Request
#---------------------------------------------------------------------------
class JdSpider(scrapy.Spider):
    name = "test"
    allowed_domains = ["jd.com"]
    start_urls = [
        "http://wap.jd.com/category/all.html"
    ]

    def parse(self, response):
        '獲取全部分類商品'
        req = []
        for sel in response.xpath('/html/body/div[5]/div[2]/a'):
            for i in sel.xpath('@href').extract():
                if 'category' in i:
                    url = "http://wap.jd.com" + i
                    r = Request(url, callback=self.parse_category)
                    req.append(r)
        return req

    def parse_category(self,response):
        '獲取分類頁'
        req = []
        for sel in response.xpath('/html/body/div[5]/div/a'):
            for i in sel.xpath('@href').extract():
                url = "http://wap.jd.com" + i
                r = Request(url, callback=self.parse_list)
                req.append(r)
        return req

    def parse_list(self,response):
        '分別獲得商品的地址和下一頁地址'
        req = []

        '下一頁地址'
        next_list = response.xpath('/html/body/div[21]/a[1]/@href').extract()
        if next_list:
            url = "http://wap.jd.com" + next_list[0]
            r = Request(url, callback=self.parse_list)
            req.append(r)

        '商品地址'
        for sel in response.xpath('/html/body/div[contains(@class, "pmc")]/div[1]/a'):
            for i in sel.xpath('@href').extract():
                url = "http://wap.jd.com" + i
                r = Request(url, callback=self.parse_product)
                req.append(r)
        return req

    def parse_product(self,response):
        '商品頁獲取title,price,product_id'
        title = response.xpath('//title/text()').extract()[0][:-7]
        price = response.xpath('/html/body/div[4]/div[4]/font/text()').extract()[0][1:]
        product_id = response.url.split('/')[-1][:-5]

        item = TutorialItem()
        item['title'] = title
        item['price'] = price
        item['product_id'] = product_id
        r.meta['item'] = item

        r = Request(re.sub('product','comments',response.url),callback=self.parse_comments) # 評論頁地址
        return r

    def parse_comments(self,response):
        '獲取商品comment數'
        comment_5 = response.xpath('/html/body/div[4]/div[2]/a[1]/font[2]/text()').extract()
        comment_3 = response.xpath('/html/body/div[4]/div[2]/a[2]/font/text()').extract()
        comment_1 = response.xpath('/html/body/div[4]/div[2]/a[3]/font/text()').extract()
        comment = comment_5 + comment_3 + comment_1
        totle_comment = sum([int(i.strip()) for i in comment])
        item = response.meta['item']
        item['comment'] = totle_comment
        print( item['comment'])
        #print response.body.decode('UTF-8')
        return item
############################################################################

scrapy 爬取京東例子

#-*- encoding: UTF-8 -*- #---------------------------------import------------------------------------ import scrapy import re from tutoria

用scrapy爬取京東商城的商品信息

keywords XML 1.5 rom toc ons lines open 3.6 軟件環境： 1 gevent (1.2.2) 2 greenlet (0.4.12) 3 lxml (4.1.1) 4 pymongo (3.6.0) 5 pyO

用scrapy爬取京東的數據

identify allow 9.png spider main %d 網頁 pro fyi 本文目的是使用scrapy爬取京東上所有的手機數據,並將數據保存到MongoDB中。一、項目介紹主要目標 1、使用scrapy爬取京東上所有的手機數據 2、將爬取的數據

scrapy爬取京東商城某一類商品的資訊和評論（二）

2、任務二：爬取商品評論資訊如果不需要爬取使用者的地域資訊，那麼用這個網址爬就好： http://club.jd.com/review/10321370917-1-1-0.html 其中10321370917是商品的ID，評論的第一頁就是 -1-1-0.htm

scrapy爬取京東商城某一類商品的資訊和評論（一）

剛寫完京東爬蟲，趁著記憶還深刻，寫點總結吧。一、前提預設已用scrapy爬取過網站，有爬蟲基礎，有爬蟲環境二、以爬取電子煙為例 1、任務一：爬取商品資訊在搜尋框裡面直接搜尋電子煙，搜出來的介面，你會發現它是動態載入的。即一開始原始碼裡面只

用scrapy爬取京東的資料

# -*- coding: utf-8 -*- import scrapy from ..items import JdphoneItem import sys reload(sys) sys.setdefaultencoding("utf-8") class JdSpider(scrapy.Spid

Scrapy爬取京東商城華為全系列手機評論

本文轉自：https://mp.weixin.qq.com/s?__biz=MzA4MTk3ODI2OA==&mid=2650342004&idx=1&sn=4d270ab7ca54f6f2f7ec7aca113993f4&chksm=87811487b0f

scrapy爬取京東

new tro allow 錯誤 head spa hone strong esp 京東對於爬蟲來說太友好了，不向天貓跟淘寶那樣的喪心病狂，本次爬蟲來爬取下京東，研究下京東的數據是如何獲取的。 1 # 目標網址： jd.com 2 # 關鍵字：手機（任意關鍵字，本

使用scrapy框架,用模擬瀏覽器的方法爬取京東上面膜資訊,並存入mysql,sqlite,mongodb資料庫

因為京東的頁面是由JavaScript動態載入的所以使用模擬瀏覽器的方法進行爬取,具體程式碼如下 : spider.py # -*- coding: utf-8 -*- import scrapy from scrapy import Request from jdpro.items

Scrapy框架基於crawl爬取京東商品資訊爬蟲

Items.py檔案 # -*- coding: utf-8 -*- # Define here the models for your scraped items # See documentation in: # https://doc.scrapy.org/en/latest/topics

Scrapy+Splash爬取京東python書本資訊（遇到的問題記錄）

今天用splash進行京東的圖書的爬蟲。有了以下幾點的錯誤總結: （1）按照參考書上的方式，寫好lua_script檔案。但是自己在lua_script檔案後面加了幾個中文註釋，結果執行時一直出錯，後來意識到了問題，將這些中文註釋給刪除了，這時候才沒有提示剛剛出現的錯誤。

python3 scrapy框架crawl模版爬取京東產品並寫入mysql

crawl將自動對所有連結進行分析，將符合的連結資料爬取。官方文件，其中價格，好評率需要用瀏覽器抓包分析真實地址，本文所用的基礎技術包括：sql語句，re表示式,xpath表示式，基本的網路知識和python基礎 jd.py # -*- codi

scrapy框架爬取京東商城商品的評論

一、Scrapy介紹 Scrapy是一個為了爬取網站資料，提取結構性資料而編寫的應用框架。可以應用在包括資料探勘，資訊處理或儲存歷史資料等一系列的程式中。所謂網路爬蟲，就是一個在網上到處或定向抓取資料的程式，當然，這種說法不夠專業，更專業的描述就是，抓取特定網站網頁的H

scrapy爬取中關村在線手機頻道

tex ice extract base .section title .html release nbsp 1 # -*- coding: utf-8 -*- 2 import scrapy 3 from pyquery import PyQuery as pq

scrapy爬取豆瓣電影top250

imp port 爬取 all lba item text request top 1 # -*- coding: utf-8 -*- 2 # scrapy爬取豆瓣電影top250 3 4 import scrapy 5 from douban.items i

scrapy爬取小說盜墓筆記

xtra pipeline odin trac items style ict ref open # -*- coding: utf-8 -*- import scrapy import requests from daomu.items import DaomuItem

scrapy爬取西刺網站ip

close mon ins css pro bject esp res first # scrapy爬取西刺網站ip # -*- coding: utf-8 -*- import scrapy from xici.items import XiciItem clas

python制作爬蟲爬取京東商品評論教程

頭文件天津 ref back 文字 eai 目的格式 open 作者：藍鯨類型：轉載本文是繼前2篇Python爬蟲系列文章的後續篇，給大家介紹的是如何使用Python爬取京東商品評論信息的方法，並根據數據繪制成各種統計圖表，非常的細致，有需要的小夥伴可以參考下

Python爬蟲從入門到放棄（十八）之 Scrapy爬取所有知乎用戶信息(上)

user 說過 -c convert 方式 bsp 配置文件 https 爬蟲爬取的思路首先我們應該找到一個賬號，這個賬號被關註的人和關註的人都相對比較多的，就是下圖中金字塔頂端的人，然後通過爬取這個賬號的信息後，再爬取他關註的人和被關註的人的賬號信息，然後爬取被關註人

Scrapy爬取慕課網(imooc)所有課程數據並存入MySQL數據庫

start table ise utf-8 action jpg yield star root 爬取目標：使用scrapy爬取所有課程數據，分別為 1.課程名 2.課程簡介 3.課程等級 4.學習人數並存入MySQL數據庫（目標網址 http://www.imoo

scrapy 爬取京東例子

相關推薦