python爬蟲：使用selenium + ChromeDriver爬取途家網

阿新 • • 發佈：2019-02-15

說明

本站（途家網https://www.tujia.com）通過常規抓頁面的方法不能獲取資料，可以使用selenium + ChromeDriver來獲取頁面資料。

0 指令碼執行順序與說明

0.1 先執行craw_url.py，獲得所有房子詳情頁的url
0.2 執行slice_url.py，把所有的url等份，便於後續作多執行緒爬取
0.3 執行craw.py，獲取每個房子的具體資料

1 注意

1.1 本站的資料為動態載入，用到了selenium + ChromeDriver來獲取頁面資料
1.2 專案中附有chromedriver.exe，需要安裝谷歌瀏覽器（如果執行不了，可能是瀏覽器和chromedriver.exe版本不對應，對應的瀏覽器版本為69）
1.3 注意driver模擬操作後，需要等待1-2s後才能獲取到資料
1.4 本站有反爬，每一次頁面操作設定睡眠6s即可
1.5 chrome_options.add_argument(“headless”) 設定為不開啟瀏覽器介面

2 爬取內容

2.1 途家網https://www.tujia.com/unitlist?cityId=10
2.2 爬取欄位及說明見截圖

截圖

在這裡插入圖片描述

程式碼

1 craw_url.py （獲得所有房子詳情頁的url）

#! /usr/bin/env python
# -*- coding: utf-8 -*-
from selenium import webdriver
import time
import os


# 啟動driver
def init_driver(url):
    chrome_options = webdriver.ChromeOptions()
    chrome_options. 
add_argument("headless") # 不開啟瀏覽器

    driver_path = "./bin/chromedriver.exe"
    driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=driver_path)

    driver.get(url)
    # html = driver.page_source
    # print(html.encode("GBK",'ignore'))

    # time.sleep(3)
    return driver


# 如果檔案存在，則刪除 

def del_file(file_path):
    if os.path.exists(file_path):
        os.remove(file_path)


# 獲取頁面url
def get_url(drive):
    # 獲取總頁數
    total_str = driver.find_elements_by_class_name('pageItem')[-1].get_attribute('page-data')
    total = int(total_str)
    # 點選下一頁
    click_num = 0
    while click_num < total:
        driver.find_elements_by_class_name('pageItem')[-2].click()
        click_num += 1
        time.sleep(6)

        # 每一頁的項數
        item = driver.find_elements_by_class_name('searchresult-cont')
        item_num = len(item)
        # 獲取到該頁面所有項的url
        for i in range(item_num):
            xpath = '//*[@id="unitList-container"]/div/div[' + str(i+1) + ']/div[2]/div[1]/h3/a'
            url = driver.find_element_by_xpath(xpath).get_attribute('href')
            print(str(i) + '\t' + url)
            # 把url寫到本地
            with open('./data/url/url.txt', 'a', encoding='utf-8') as f:
                f.write(url + '\n')

    close_driver(driver)


def close_driver(driver):
    driver.quit()


if __name__ == '__main__':
    root_url = 'https://www.tujia.com/unitlist?startDate=2018-12-10&endDate=2018-12-11&cityId=10&ssr=off'
    driver = init_driver(root_url)
    del_file('./data/url/url.txt')
    get_url(driver)

2 slice_url.py（把所有的url等份，便於後續作多執行緒爬取）

#! /usr/bin/env python
# -*- coding: utf-8 -*-
import math


# url比較多，一次性爬取可能會出現問題，分多步爬取
def main(slice_num):
    # 讀取所有的url
    with open('./data/url/url.txt', 'r') as f:
        urls = f.readlines()

    urls_num = len(urls)
    step = math.ceil(urls_num / slice_num)

    # 寫url
    for i in range(slice_num):
        with open('./data/url/url_' + str(i+1) + '.txt', 'w', encoding='utf-8') as f:
            for j in range(step*i, step*(i+1)):
                try:
                    f.write(urls[j])
                except:
                    break

if __name__ == '__main__':
    # 分30等份
    main(30)

3 craw.py（獲取每個房子的具體資料）

#! /usr/bin/env python
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import os
import time
import threading


# 啟動driver
def init_driver(url, index):
    global threads
    threads['Thread_' + str(index)] += 1
    print('Thread_' + str(index) + '\t' + str(threads['Thread_' + str(index)]))

    chrome_options = webdriver.ChromeOptions()
    # chrome_options.add_argument("headless") # 不開啟瀏覽器

    driver_path = "./bin/chromedriver.exe"
    driver = webdriver.Chrome(options=chrome_options, executable_path=driver_path)

    try:
        driver.get(url)
    except:
        pass
    # html = driver.page_source
    # print(html.encode("GBK",'ignore'))

    # time.sleep(2)
    return driver


def close_driver(driver):
    driver.quit()


# 如果檔案存在，則刪除
def del_file(file_path):
    if os.path.exists(file_path):
        os.remove(file_path)


# 讀取本地的url
def read_url(file_path):
    with open(file_path, 'r') as f:
        urls = f.readlines()
    return urls


# 獲取頁面資料
def get_data(driver, file_path, index):
    try:
        # 店名，價格，房屋標籤，支付標籤，優勢標籤
        name = driver.find_element_by_xpath('//div[@class="house-name"]').text
        price = ''
        try:
            price = driver.find_element_by_xpath('//a[@class="present-price"]').text
        except:
            pass
        # 房屋面積
        area = ''
        try:
            house_type_element = driver.find_element_by_xpath('//*[@id="houseInfo"]/div/div/div[1]/div[3]/ul/li[2]')
            ActionChains(driver).move_to_element(house_type_element).perform()
            area = driver.find_element_by_xpath('//*[@id="houseInfo"]/div/div/div[1]/div[3]/ul/li[2]/div').text
        except:
            pass

        room_tag = ''
        try:
            room_tag = driver.find_element_by_xpath('//ul[@class="room-tag"]').text.replace('\n', ' ')
        except:
            pass
        pay_tag = ''
        try:
            pay_tag = driver.find_element_by_xpath('//ul[@class="pay-tag"]').text.replace('\n', ' ')
        except:
            pass
        advan_tag = ''
        try:
            advan_tag = driver.find_element_by_xpath('//div[@class="hotel-advan-tag"]').text.replace('\n', ' ')
        except:
            pass



        # 房屋守則
        house_rules = ''
        try:
            house_rules_all = driver.find_elements_by_xpath('//*[@id="unitcheckinneedtoknow"]/div[2]/div[2]/div[5]/ol/li')
            house_rules_dis = driver.find_elements_by_xpath('//*[@id="unitcheckinneedtoknow"]/div[2]/div[2]/div[5]/ol/li[@class="not"]')
            house_rules = ''
            for item in house_rules_all:
                house_rules += item.text + ' '
            for item in house_rules_dis:
                if item.text:
                    house_rules = house_rules.replace(item.text + ' ', '')
            # print(house_rules.encode('gbk', 'ignore').decode('gbk'))
        except:
            pass

        # 設施服務
        facility_service = ''
        # try:
            # 點選檢視更多
        scrollTop = 800
        success = False
        while not success:
            try:
                js = "var q=document.documentElement.scrollTop=800"
                driver.execute_script(js)
                driver.find_element_by_xpath('//*[@id="facilityshowmore"]/a').click()
                success = True
            except:
                scrollTop += 100
            time.sleep(1)
        # 分類，內容
        try:
            category_item = driver.find_elements_by_xpath('//*[@id="listWrap"]/h5')
            # print(category_item)
            content_item = driver.find_elements_by_xpath('//*[@id="listWrap"]/ul')
            # print(content_item)
            for index, category_ in enumerate(category_item):
                category = category_.text
                content = content_item[index].text.replace('\n', ' ')
                if category:
                    facility_service += category + '（'
                    facility_service += content + '）  '
        except:
            pass

        try:
            facility_dis = driver.find_elements_by_xpath('//*[@id="listWrap"]//li[@class="i-not"]')
            for item in facility_dis:
                # print(item)
                if item.text:
                    facility_service = facility_service.replace(item.text + ' ', '')
                # print(item.text.encode('gbk', 'ignore').decode('gbk'),end=' ')
            # print(facility_service.encode('gbk', 'ignore').decode('gbk'))
        except:
            pass

        # 房東資訊
        # 房東型別
        landlord_type = ''
        try:
            landlord_type = driver.find_element_by_xpath('//*[@id="landlordInfo"]/div/div[2]/div/h2/span').text
        except:
            pass
        # 房東認證
        landlord_authentication = ''
        try:
            landlord_authentication = driver.find_element_by_xpath('//*[@id="landlordInfo"]/div/div[2]/div/div[2]').text
        except:
            pass
        # 其他房屋數
        landlord_other_house_num = ''
        try:
            landlord_other_house_num = driver.find_element_by_xpath('//div[@class="landlord-other-house"]/h2/span').text
        except:
            pass
        # print(landlord_type)
        # print(landlord_authentication)
        # print(landlord_other_house_num)

        # # 評價
        # # 綜合評分，單項評分，評論數，帶照片評論數
        overall_score = ''
        single_score = ''
        comment_sum = ''
        comment_photo_sum = ''
        try:
            overall_score = driver.find_element_by_xpath('//*[@id="overallScore"]').text
            single_score = driver.find_element_by_xpath('//*[@id="comment-summary"]/div[2]/div[1]/div[2]').text.replace('分', '')
            comment_sum = driver.find_element_by_xpath('//*[@id="comment_filter"]/li[1]/span').text.replace('(', '').replace(')', '')
            comment_photo_sum = driver.find_element_by_xpath('//*[@id="comment_filter"]/li[2]/span').text.replace('(', '').replace(')', '')
        except:
            pass

        # print('Thread_' + str(index) + '\t' + str(threads['Thread_' + str(index)]), end='\t')
        # print('\tThread_' + str(index))
        # # 先用 GBK 編碼，加個 ignore 丟棄錯誤的字元，然後再解碼
        print('\t----店名----\t' + name.encode('gbk', 'ignore').decode('gbk'))
        # print('\t----價格----\t' + price.encode('gbk', 'ignore').decode('gbk'))
        print('\t--建築面積--\t' + area.encode('gbk', 'ignore').decode('gbk'))
        # print('\t----房屋----\t' + room_tag.encode('gbk', 'ignore').decode('gbk'))
        # print('\t----支付----\t' + pay_tag.encode('gbk', 'ignore').decode('gbk'))
        # print('\t----優勢----\t' + advan_tag.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--設施服務--\t' + facility_service.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--房屋守則--\t' + house_rules.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--房東型別--\t' + landlord_type.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--房東認證--\t' + landlord_authentication.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--其他房數--\t' + landlord_other_house_num.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--綜合評分--\t' + overall_score.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--單項評分--\t' + single_score.encode('gbk', 'ignore').decode('gbk'))
        # print('\t---評論數---\t' + comment_sum.encode('gbk', 'ignore').decode('gbk'))
        # print('\t--照評論數--\t' + comment_photo_sum.encode('gbk', 'ignore').decode('gbk'))


        # 寫入資料到本地
        with open(file_path, 'a', encoding='utf-8') as f:
            f.write('--------------------------------------------------------------\n')
            f.write('\t----店名----\t' + name.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t----價格----\t' + price.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--建築面積--\t' + area.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t----房屋----\t' + room_tag.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t----支付----\t' + pay_tag.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t----優勢----\t' + advan_tag.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--設施服務--\t' + facility_service.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--房屋守則--\t' + house_rules.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--房東型別--\t' + landlord_type.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--房東認證--\t' + landlord_authentication.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--其他房數--\t' + landlord_other_house_num.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--綜合評分--\t' + overall_score.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--單項評分--\t' + single_score.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t---評論數---\t' + comment_sum.encode('gbk', 'ignore').decode('gbk') + '\n')
            f.write('\t--照評論數--\t' + comment_photo_sum.encode('gbk', 'ignore').decode('gbk') + '\n')

        # 獲取當前頁評論
        get_data_comment(driver, file_path)

        # 評論內容
        # 評論總頁數
        comment_page_num = 1
        try:
            comment_page_num_str = driver.find_elements_by_xpath('//*[@id="comment_list"]/li[1]/div[2]/ul/li')[-1].get_attribute('page-data')
            comment_page_num = int(comment_page_num_str)
        except:
            pass
        # 點選下一頁
        if comment_page_num > 1:
            click_num = 0
            while click_num < comment_page_num:
                # 當前頁最後一項評論的時間
                try:
                    last_item = driver.find_element_by_xpath('//*[@id="comment_list"]/li[1]/div[1]/ul/li[last()]/div[2]/div[1]/div/span[2]').text
                    date = last_item.replace('-', '')[:6]
                    # 日期大於2017年9月的
                    if int(date) < 201709:
                        break
                except:
                    pass
                # print(date.encode('gbk', 'ignore').decode('gbk'))
                # 滑動到底部
                js = "var q=document.documentElement.scrollTop=10000"
                driver.execute_script(js)
                time.sleep(2)
                try:
                    driver.find_elements_by_xpath('//*[@id="comment_list"]/li[1]/div[2]/ul/li')[-2].click()
                except:
                    break
                '//*[@id="comment_list"]/li[1]/div[2]/ul/li[7]'
                click_num += 1
                time.sleep(4)
                # 獲取當前頁評論
                get_data_comment(driver, file_path)

        close_driver(driver)
    except:
        print('error')
        close_driver(driver)


# 獲取評論模組資料
def get_data_comment(driver, file_path):
    try:
        # 當前頁評論數
        comment_curr_page = driver.find_elements_by_xpath('//*[@id="comment_list"]/li[1]/div[1]/ul/li')
        comment_curr_page_num = len(comment_curr_page)
        for index in range(comment_curr_page_num):
            xpath_head = '//*[@id="comment_list"]/li[1]/div[1]/ul/li[' + str(index + 1) + ']'
            # 評論人
            comment_person = driver.find_element_by_xpath(xpath_head + '/div[2]/div[1]/div/span[1]').text
            # 評論時間
            comment_time = driver.find_element_by_xpath(xpath_head + '/div[2]/div[1]/div/span[2]').text.replace('點評', '')
            # 評論內容
            comment_content = driver.find_element_by_xpath(xpath_head + '/div[2]/div[2]').text

            # 是否回覆
            comment_replay = ''
            try:
                comment_replay = driver.find_element_by_xpath(xpath_head + '/div[2]/div[4]/div[1]/div[2]/p').text.replace(
                    '：', '')

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    python爬蟲：使用selenium + ChromeDriver爬取途家網
      
							
							
							
說明
本站（途家網https://www.tujia.com）通過常規抓頁面的方法不能獲取資料，可以使用selenium + ChromeDriver來獲取頁面資料。
0 指令碼執行順序與說明

0.1 先執行craw_url.py，獲得所有房子詳情頁的url 

  
 

    

    
    Python爬蟲：Selenium+ BeautifulSoup 爬取JS渲染的動態內容（雪球網新聞）
      
                

爬取目標：下圖中紅色方框部分的文章內容。（需要點選每篇文章的連結才能獲得文章內容）

注：該文章僅介紹爬蟲爬取新聞這一部分，爬蟲語言為Python。

 乍一看，爬蟲的實現思路很簡單：

（2）通過第一步所獲得的各篇文章的URL，抓取文章內容。
但是發現簡單使用urlli 

  
 

    

    
    資料採集（四）：用XPath爬取鏈家網房價資料
      
							
							
							準備工作

編寫爬蟲前的準備工作，我們需要匯入用到的庫，這裡主要使用的是requests和lxml兩個。還有一個Time庫，負責設定每次抓取的休息時間。

import requests
import requests
import time
from lxml 

  
 

    

    
    Python爬蟲：selenium掛shadowsocks代理爬取網頁內容
       
  
  
 selenium掛ss代理爬取網頁內容 
 from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import  

  
 

    

    
    Python網路爬蟲（四）：selenium+chrome爬取美女圖片
      
							
							
							說明： 
Python版本：Python 
IDE：PyCharm 
chrome版本：我的版本63 
chromedriver.exe：因為是模擬瀏覽器訪問，chrome需要再下載一個驅動，具體方式在我的上一篇部落格，內容很詳細。傳送門：Python網路爬蟲（ 

  
 

    

    
    Python爬蟲之利用BeautifulSoup爬取豆瓣小說（三）——將小說信息寫入文件
      設置   one   行為   blog   應該   +=   html   uil   rate   
 1 #-*-coding:utf-8-*-
 2 import urllib2
 3 from bs4 import BeautifulSoup
 4 
 5 class dbxs:
 6 
 7   

  
 

    

    
    Python 爬蟲簡單實現 （爬取下載連結）
       
  
  
  
  
  
  
  
   
   原文地址：https://www.jianshu.com/p/8fb5bc33c78e 
   專案地址：https://github.com/Kulbear/All-IT-eBooks-Spider 
    
   
   這幾日和朋友搜尋東西的 

  
 

    

    
    Python 爬蟲技巧1 | 將爬取網頁中的相對路徑轉換為絕對路徑
       
 
 1.背景： 
 在爬取網頁中的過程中，我對目前爬蟲專案後端指令碼中拼接得到絕對路徑的方法很不滿意，今天很無意瞭解到在python3 的 urllib.parse模組對這個問題有著非常完善的解決策略，真的是上天有眼，感動！ 
 2.urllib.parse模組 
 This module define 

  
 

    

    
    python爬蟲——40行程式碼爬取「筆趣看」全部小說 你都看了嗎？
       
 
 需求分析 
 ”筆趣看“ 是一個盜版小說網站，這裡有各大知名小說網站的小說，更新速度略慢於正版網站。但是該網站只支援線上瀏覽，不支援小說下載，對於想要下載下來以防斷網或者網速不好時也能看的童鞋來說不太友好。因此，本次練習將爬取該網站所有小說。PS：本次練習僅為學習交流，請各位童鞋支援正版。 
 爬取 

  
 

    

    
    Python：scrapy框架爬取校花網男神圖片儲存到本地
       
 
 爬蟲四部曲，本人按自己的步驟來寫，可能有很多漏洞，望各位大神指點指點 
  
   
 1、建立專案 
 scrapy startproject xiaohuawang 
 scrapy.cfg: 專案的配置檔案 
 xiaohuawang/: 該專案的python模組。之後您將在此加入程 

  
 

    

    
    python 學習 - 爬蟲入門練習 爬取鏈家網二手房資訊
       
 import requests
from bs4 import BeautifulSoup
import sqlite3

conn = sqlite3.connect("test.db")
c = conn.cursor()

for num in range(1,101):
    url = "h 

  
 

    

    
    經典爬蟲：用Scrapy爬取百度股票
       
 
 前言 
 今天我們編寫一個用 Scrapy 框架來爬取百度股票的程式碼，之前寫過一篇爬取百度股票的文章（點我），程式碼的邏輯和這篇文章的邏輯是一樣的，用到的解析器不同罷了。 
  
   
  
 Scrapy 爬蟲框架 
 Scrapy 爬蟲框架是由 7+2 的結構構成： 引擎 

  
 

    

    
    一個月入門Python爬蟲學習，輕鬆爬取大規模資料
      
                利用爬蟲我們可以獲取大量的價值資料，從而獲得感性認識中不能得到的資訊，這篇文章給大家帶來了一個月入門Python學習,爬蟲輕鬆爬取大規模資料，感興趣的朋友一起看看吧

資料獲取方式：Python技術學習QQ群832339352 新增即可免費獲取！





Python爬蟲為 

  
 

    

    
    Python爬蟲：selenium開啟新視窗和多視窗切換
       
 
 
 上說可以通過傳送按鍵事件觸發，比如ctrl+T，不過我沒成功，使用了js開啟新視窗的方式程式碼示例 
 # -*- coding: utf-8 -*-

# @File    : switch_tab.py
# @Date    : 2018-07-27
 

  
 

    

    
    python 爬蟲 使用正則爬取51job內容並存入txt
      
							
							
							python爬蟲基礎–使用正則提取51job內容輸出到txt
from urllib import request
#url
url = 'https://search.51job.com/list/020000%252C010000%252C080200%25 

  
 

    

    
    python爬蟲【一】爬取文字
       
 
 
 我們在安裝py是建議如果使用windows不要安裝原生的py因為windows的c編譯器原因會使某些套件安裝起來有麻煩 
 也就是安裝anaconda版本的pyhttps://www.anaconda.com/download/#windows 
 py官網下載的是原生版本https://www 

  
 

    

    
    python爬蟲學習 之 定向爬取 股票資訊
      
							
							
							一、功能描述 
目標：獲取上交所和深交所所有股票的名稱和交易      資訊 
輸出：儲存到檔案中

技術路線：requests-bs4-re

二、 
選取原則：股票資訊靜態存在於HTML頁面中，非js程式碼生成，沒有robots協議限制

三、程式的結構設計 

  
 

    

    
    python爬蟲【二】爬取新聞
       
 
 
 在一個新聞站點或者絢麗的網頁會有許多id和class 我們可以通過觀察來看到我們需要的資訊在那些id和class下 
 但是這裡介紹兩種快速便捷的方法 
 第一種使用谷歌瀏覽器自帶的開發者工具 
  
   
  
  或者安裝infolite外掛安裝方法看這篇https:/ 

  
 

    

    
    Python爬蟲：Selenium常用操作，下載youtube視訊例項
       
  
  
 selenium常用操作： 
 from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 

  
 

    

    
    python爬蟲由淺入深9---定向爬取股票資料資訊並儲存至本地檔案
      
                
技術路線：requests庫+bs4庫+re庫的整合使用
目標：獲得上交所和深交所所有股票的名稱和交易資訊
輸出：儲存至本地檔案
可選資料網路有：新浪股票和百度股票，，通過檢視網頁原始碼可知，新浪股票的資料是通過javascript指令碼獲取的，故通過以上方式無法解析
呃呃

python爬蟲：使用selenium + ChromeDriver爬取途家網

說明

0 指令碼執行順序與說明

1 注意

2 爬取內容

截圖

程式碼

1 craw_url.py （獲得所有房子詳情頁的url）

2 slice_url.py（把所有的url等份，便於後續作多執行緒爬取）

3 craw.py（獲取每個房子的具體資料）

python爬蟲：使用selenium + ChromeDriver爬取途家網

Python爬蟲：Selenium+ BeautifulSoup 爬取JS渲染的動態內容（雪球網新聞）

資料採集（四）：用XPath爬取鏈家網房價資料

Python爬蟲：selenium掛shadowsocks代理爬取網頁內容

Python網路爬蟲（四）：selenium+chrome爬取美女圖片

Python爬蟲之利用BeautifulSoup爬取豆瓣小說（三）——將小說信息寫入文件

Python 爬蟲簡單實現（爬取下載連結）

Python 爬蟲技巧1 | 將爬取網頁中的相對路徑轉換為絕對路徑

python爬蟲——40行程式碼爬取「筆趣看」全部小說你都看了嗎？

Python：scrapy框架爬取校花網男神圖片儲存到本地

python 學習 - 爬蟲入門練習爬取鏈家網二手房資訊

經典爬蟲：用Scrapy爬取百度股票

一個月入門Python爬蟲學習，輕鬆爬取大規模資料

Python爬蟲：selenium開啟新視窗和多視窗切換

python 爬蟲使用正則爬取51job內容並存入txt

python爬蟲【一】爬取文字

python爬蟲學習之定向爬取股票資訊

python爬蟲【二】爬取新聞

Python爬蟲：Selenium常用操作，下載youtube視訊例項

python爬蟲由淺入深9---定向爬取股票資料資訊並儲存至本地檔案