Python爬蟲爬取資料存入MongoDB

阿新 • • 發佈：2018-12-24

from bs4 import BeautifulSoup
import requests
import time
import pymongo

client = pymongo.MongoClient('Localhost', 27017)
ceshi = client['ceshi']
url_list = ceshi['url_list3']
item_info = ceshi['item_info3']
def get_links_form(channel, pages, who_sells=0):
    
    list_view = '{}{}/pn{}/'.format(channel, str(who_sells), str(pages))
    wb_data = requests.get(list_view)
    time.sleep(1)
    soup = BeautifulSoup(wb_data.text, 'lxml')
    if soup.find('td' 't'):
        
        for link in soup.select('td.t a.t'):
            item_link = link.get('href').split('?')[0]
            url_list.insert_one({'url': item_link})
            print(item_link)
    else:
        pass
#get_links_form('http://bj.58.com/shuma/',2)  
def get_item_info(url):
    wb_data = requests.get(url)
    soup = BeautifulSoup(wb_data.text, 'lxml')
    no_longer_exit = '404' in soup.find('script', type = "text/javascript").get('src').split('/')
    if no_longer_exit:
        pass
    else:
        
        title = soup.title.text
        price = soup.select('span.price.c_f50')[0].text
        date = soup.select('.time')[0].text
        area = list(soup.select('.c_25d a')[0].stripped_strings) if soup.find_all('span', 'c_25d') else None
        item_info.insert_one({'title':title, 'price':price, 'date':date , 'area':area})
        print({'title':title, 'price':price, 'date':date , 'area':area})
        
get_item_info("http://bj.58.com/diannao/31994026546616x.shtml")

Python爬蟲爬取資料存入MongoDB

from bs4 import BeautifulSoup import requests import time import pymongo client = pymongo.MongoClient('Localhost', 27017) ceshi = client[

Python 爬蟲爬取京東商品評論資料，並存入CSV檔案

利用閒暇時間寫了一個抓取京東商品評論資料的爬蟲。之前寫了抓取拉勾網資料的爬蟲，請參考1，參考2。我的開發環境是Windows + Anaconda3（Python 3.6），家用電腦沒安裝Linux（Linux下也是可以的）。京東的評論資料是通過介面提供的，所以先找

Python 爬蟲爬取單個基因表格資料的生物學功能（urllib+正則表示式）：

Python 爬蟲爬取單個基因的生物學功能（urllib+正則表示式）： import re import urllib from urllib import request url = 'https://www.ncbi.nlm.nih.gov/gene/?term=FUT1'

python：爬蟲爬取資料的處理之Json字串的處理（2）

#Json字串的處理 Json字串轉化為Python資料型別 import json JsonStr ='{"name":"sunck","age":"18","hobby":["money","power","English"],"parames":{"a":1,"b":2}}' Js

python ：通過爬蟲爬取資料（1）

(1)通過url爬取網頁資料 import urllib.request #指定url url ="https://www.baidu.com" #向伺服器發起請求，返回響應的資料，通過infor接收 infor = urllib.request.urlopen(url)

Python爬蟲爬取網上圖片原始碼，可用來製作深度學習資料集

這次利用python設計一個爬取百度圖片上的圖片的原始碼，其中利用的是python的urllib，如果沒有裝的，可以使用Anconda在環境裡進行安裝或者 pip install urllib 這兩種方式都可以安裝，長話短說，上圖吧，點選執行後，輸入你要下載的圖片型別：比如，熊貓？美女？

python爬蟲爬取今日頭條APP資料（無需破解as ,cp，_cp_signature引數）

#!coding=utf-8 import requests import re import json import math import random import time from requests.packages.urllib3.exceptions import Insecure

python爬蟲爬取京東店鋪商品價格資料(更新版)

主要使用的庫： requests:爬蟲請求並獲取原始碼 re：使用正則表示式提取資料 json:使用JSON提取資料 pandas：使用pandans儲存資料 ##sqlalchemy ：備用方案，上傳資料到mysql 以下是原始碼： # -*- coding:utf

python爬蟲爬取淘寶搜尋頁面商品資訊資料

主要使用的庫： requests:爬蟲請求並獲取原始碼 re：使用正則表示式提取資料 json:使用JSON提取資料 pandas：使用pandans儲存資料以下是原始碼： #!coding=utf-8 import requests import re import

python 爬蟲爬取網易嚴選全網商品價格評論資料

1.獲取商品目錄在Chrome瀏覽器開發者工具中，可以找到目錄的JS地址： http://you.163.com/xhr/globalinfo//queryTop.json 得到商品資料 def get_categoryList():

Python 爬蟲爬取單個基因表格資料的生物學功能（urllib+正則表示式）：

Python 爬蟲爬取單個基因的生物學功能（urllib+正則表示式）： import re import urllib from urllib import request url = ‘https://www.ncbi.nlm.nih.gov/gene

python爬蟲-爬取愛情公寓電影（2018）豆瓣短評並資料分析

說起這部電影，我本人並沒有看，其實原先是想為了情懷看一下，但是好友用親身經歷告訴我看來會後悔的，又去看了看豆瓣評分，史無前例的，，，低。出於興趣就爬取一下這部電影在豆瓣上的短評，並且用詞雲分析一下。 1.分析url 經過分析不難發現每一頁短評的url都是一致的除

Python爬蟲爬取網頁資料並存儲（一）

環境搭建 1.需要事先安裝anaconda（或Python3.7）和pycharm *anaconda可在中科大映象下下載較快 2.安裝中遇到的問題： *anaconda（記得安裝過程中點新增路徑到path裡，沒新增的話手動新增：計算機右鍵屬性——高階系統設

python爬蟲——爬取豆瓣電影top250資訊並載入到MongoDB資料庫中

最近在學習關於爬蟲方面的知識，因為剛開始接觸，還是萌新，所以有什麼錯誤的地方，歡迎大家指出 from multiprocessing import Pool from urllib.request import Request, urlopen import re, pymongo index

Python爬蟲--爬取歷史天氣資料

寫在前面：爬蟲是老鼠屎在進入實驗室後接觸的第一個任務，當時剛剛接觸程式碼的老鼠屎一下子迎來了地獄難度的爬微博簽到資料。爬了一個多月毫無成果，所幸帶我的師兄從未給我疾言厲色，他給與了我最大的包容與理解。儘管無功而返，但是那一個月也給了老鼠屎充足的學習時間，讓老鼠屎對爬蟲

Python爬蟲爬取NBA資料

爬取的網站為：stat-nba.com，本文爬取的是NBA2016-2017賽季常規賽至2017年1月7日的資料改變url_header和url_tail即可爬取特定的其他資料。原始碼如下： #coding=utf-8 import sys reload(sys) sy

（8）Python爬蟲——爬取豆瓣影評資料

利用python爬取豆瓣最受歡迎的影評50條的相關資訊，包括標題,作者,影片名,影片詳情連結,推薦級,迴應數,影評連結,影評,有用數這9項內容，然後將爬取的資訊寫入Excel表中。具體程式碼如下： #!/usr/bin/python # -*- codin

【爬蟲】python selenium 爬取資料

最近公司有一項爬取資料的工作，借鑑以往的程式碼將爬蟲重新更新並整理將現有爬蟲分成幾部分 0.檔案讀取器其實檔案讀取和4中的檔案儲存是在一個部分的這裡簡單介紹下xls的讀取def deal_xl

Django實戰: Python爬蟲爬取鏈家上海二手房資訊，存入資料庫並在前端顯示

好久沒寫Django實戰教程了，小編我今天就帶你把它與Python爬蟲結合做出個有趣的東西吧。我們將開發這樣一個應用，前端使用者可以根據行政區劃，房廳數和價格區間選擇需要爬取的二手房房源資訊，後臺Python開始爬取資料。爬取資料完成後，通過Django將爬來的資料存入資料庫

python爬蟲爬取淘寶網頁資料

O、requests 和 re 庫的介紹 requests庫是一個小型好用的網頁請求模組，可用於網頁請求，常用來編寫小型爬蟲安裝requests可以使用pip命令：在命令列輸入 pip install requests re庫是正則表示式庫，是p

Python爬蟲 爬取資料存入MongoDB

相關推薦

Python爬蟲爬取資料存入MongoDB