Python爬蟲實戰一：爬取csdn學院所有課程名、價格和課時

阿新 • • 發佈：2019-01-11

import urllib.request 
import re,xlwt,datetime

class csdn_spider():
    def __init__(self):
        self.c = 0
    def sava_data(self,name,class_num,price):
        #建立workbook和sheet物件
        workbook = xlwt.Workbook()
        sheet1 = workbook.add_sheet('sheet1',cell_overwrite_ok=True)

        #初始化excel樣式
        style = xlwt.XFStyle()

        #為樣式建立字型
        font = xlwt.Font()
        font.name = 'Times New Roman'
        font.bold = True

        #設定樣式的字型
        style.font = font

        #在sheet1表的第1行設定欄位名稱並寫入資料
        sheet1.write(0,0,"序號",style)
        sheet1.write(0,1,"課程名",style)
        sheet1.write(0,2,"課時",style)
        sheet1.write(0,3,"價格",style)

        a=0                                                                #定義行號初始值
        for i in range(0,self.c-1):
            #print(str(a+1),i[0])
            sheet1.write(a+1,0,a+1,style)                                   #在第a+1行第1列寫入序號
            sheet1.write(a+1,1,name[i],style)                                  #在第a+1行第2列寫入課程名
            sheet1.write(a+1,2,class_num[i],style)                                  #在第a+1行第3列寫入課時
            sheet1.write(a+1,3,price[i],style)                             #在第a+1行第4列寫入課程價格
            a+=1

            if a==a:                                                        #判斷XX列表是否遍歷結束
                t=datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                t1=datetime.datetime.now().strftime("%Y%m%d%H%M%S")
                sheet1.write(a+2,1,"採集時間",style)                        #在sheet1表尾行寫入資料採集時間
                sheet1.write(a+2,2,t,style)

        workbook.save("E:/csdn學院課程彙總表"+str(t1)+".xls")                 #儲存該excel檔案,有同名檔案時無法直接覆蓋

        print("資料寫入excel檔案完畢！")
    def data(self):
        html = "https://edu.csdn.net/courses"
        name = []
        class_num = []
        price = []
        for n in range(1,299):
            url = html+'/p'+str(n)
            print(url)
            data = urllib.request.urlopen(url).read().decode('utf-8') #請求網頁，設定編碼方式為utf-8
            #print(data)
            pat1 = '<img src="(.*?)" width="179" height="120" alt="(.*?)">'
            n = re.compile(pat1).findall(str(data))
            n = list(dict(n).values()) #將資料轉為字典，並將value提取出來
            name = name+n
            pat2 = '<p><em>(.*?)</em>'
            class_num += re.compile(pat2).findall(str(data))
            # data = data.replace('\n','').replace('\t','') #將網頁的換行符替換掉
            pat3 = '<p class="clearfix">\s{1,}<i>\s{1,}(.*?)\s{1,}'
            p = re.compile(pat3).findall(str(data))
            for i in p:
                p[p.index(i)] = re.findall(r'-?\d+\.?\d*e?-?\d*?', i)
            price = price+p
            print(name,class_num,price)
        self.c = len(class_num)
        print(self.c,list(name),class_num,price)    
        self.sava_data(list(name),class_num,price)
if __name__ == '__main__':
    saveinfo = csdn_spider() #呼叫類
    save_res = saveinfo.data()

Python爬蟲實戰一：爬取csdn學院所有課程名、價格和課時

import urllib.request import re,xlwt,datetime class csdn_spider(): def __init__(self): self.c = 0 def sava_data(self,name,class_num,price

Python爬蟲——實戰一：爬取京東產品價格(逆向工程方法)

在京東的單個產品頁面上，通過檢視原始碼檢查html，可以看到 <span class="p-price"><span>￥</span><span class="price J-p-1279836"></sp

Python爬蟲——實戰三：爬取蘇寧易購的商品價格(渲染引擎方法)

蘇寧易購的商品價格請求URL為 https://pas.suning.com/nspcsale_0_000000000152709847_000000000152709847_0000000000_10_010_0100101_20268_1000000_

【Java爬蟲學習】WebMagic框架爬蟲學習實戰一：爬取網易雲歌單資訊，並存入mysql中

最近，需要使用Java進行爬蟲編寫，就去學了Java的爬蟲。因為之前學習了Scrapy框架，所以學Java的爬蟲使用了WebMagic框架，這個框架是基於Scrapy框架開發的。大家有興趣可以去看看操作文件：這個框架是國人開發的，所以說明文件都是中文，簡單易懂。

Python爬蟲系列 - 初探：爬取旅遊評論

blank .text http fir win64 ati coo get stat Python爬蟲目前是基於requests包，下面是該包的文檔，查一些資料還是比較方便。 http://docs.python-requests.org/en/master/ 爬取某旅遊

Python爬蟲系列 - 初探：爬取新聞推送

http nec apple 下標 for pri Language span round Get發送內容格式 Get方式主要需要發送headers、url、cookies、params等部分的內容。 t = requests.get(url, headers = hea

Python爬蟲實例：爬取B站《工作細胞》短評——異步加載信息的爬取

localtime pre global web for short sco 網頁解析 save 《工作細胞》最近比較火，bilibili 上目前的短評已經有17000多條。先看分析下頁面右邊 li 標簽中的就是短評信息，一共20條。一般我們加載大量數據的時候，都

爬蟲練習一：爬取睿奢圖片

爬取網站：睿奢-套裝合集-私房定製目標：爬取並儲存該網站分類下每個主題的所有圖片 python版本：python 3.6 使用庫：urllib，Beautifulsoup，os，random，re，time 對網站進行訪問檢視首先需要通過瀏覽器對目標網站進行訪問，瞭解該網站的頁面

python爬蟲【一】爬取文字

我們在安裝py是建議如果使用windows不要安裝原生的py因為windows的c編譯器原因會使某些套件安裝起來有麻煩也就是安裝anaconda版本的pyhttps://www.anaconda.com/download/#windows py官網下載的是原生版本https://www

Python爬蟲練習三：爬取豆瓣電影分類排行榜

目標網址url: https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action= 使用谷歌瀏覽器的檢查

python爬蟲實戰筆記---selenium爬取QQ空間說說並存至本地

from selenium import webdriver import time from bs4 import BeautifulSoup browser = webdriver.Chrome() browser.get('https://user.qzone.qq.com') user ='241

python爬蟲十五：爬取12306火車票資訊

轉：https://zhuanlan.zhihu.com/p/26701898 # -*- coding: utf-8 -*- ''' 獲取12306城市名和城市程式碼的資料檔名： parse_station.py ''' import requests import

python爬蟲（一）爬取豆瓣電影Top250

提示：完整程式碼附在文末一、需要的庫 requests：獲得網頁請求 BeautifulSoup：處理資料，獲得所需要的資料二、爬取豆瓣電影Top250 爬取內容為：豆瓣評分前二百五位電影的名字、主演、

Python爬蟲實戰(三):簡單爬取網頁圖片

先上程式碼:#coding=utf-8 import urllib.request for i in range(1,41): imgurl = "http://mtl.ttsqgs.com/images/img/11552/" imgurl += str(i

python爬蟲十二：爬取快速ip代理，攻破503

轉：https://zhuanlan.zhihu.com/p/26701898 1.自定爬蟲方法 # -*- coding: utf-8 -*- import scrapy import requests from proxy.items import ProxyItem

Python爬蟲新手教程：爬取了6574篇文章，告訴你產品經理在看什麼！

作為網際網路界的兩個對立的物種，產品汪與程式猿似乎就像一對天生的死對頭；但是在產品開發鏈條上緊密合作的雙方，只有通力合作，才能更好

【爬蟲小程式：爬取鬥魚所有房間資訊】Xpath(執行緒池版)

# 本程式親測有效,用於理解爬蟲相關的基礎知識，不足之處希望大家批評指正 from queue import Queue import requests from lxml import etree from multiprocessing.dummy import Pool import t

【爬蟲小程式：爬取鬥魚所有房間資訊】Xpath(多執行緒版)

# 本程式親測有效,用於理解爬蟲相關的基礎知識，不足之處希望大家批評指正 from queue import Queue import requests from lxml import etree from threading import Thread "

【爬蟲小程式：爬取鬥魚所有房間資訊】Xpath(多程序版)

# 本程式親測有效,用於理解爬蟲相關的基礎知識，不足之處希望大家批評指正 1 import requests 2 from lxml import etree 3 from multiprocessing import JoinableQueue as Queue 4 from

python爬蟲案例——根據網址爬取中文網站，獲取標題、子連線、子連線數目、連線描述、中文分詞列表

全棧工程師開發手冊（作者：欒鵬）其中使用到了urllib、BeautifulSoup爬蟲和結巴中文分詞的相關知識。除錯環境python3.6 # 根據連線爬取中文網站

Python爬蟲實戰一：爬取csdn學院所有課程名、價格和課時

相關推薦