使用seleinum模組爬取熊貓直播平臺全部的主播房間。

阿新 • • 發佈：2018-12-13

下面我就直接放全部程式碼，主要地方我都有註釋，就不一一在程式碼外寫出來了：

# author: aspiring

from selenium import webdriver
import time
import json


class XiongmaoSpider:
    def __init__(self):
        self.start_url = "https://www.panda.tv/all"  # start_url
        self.driver = webdriver.Chrome()  # 例項化一個瀏覽器

    def get_content_list(self):  # 提取資料
        li_list = self.driver.find_elements_by_xpath("//ul[@id='later-play-list']/li")  # 分組
        content_list = []
        for li in li_list:
            item = {}
            item["name"] = li.find_element_by_xpath(".//span[@class='video-nickname']").get_attribute("title")
            item["title"] = li.find_element_by_xpath(".//span[@class='video-title']").text
            item["room_img"] = li.find_element_by_xpath(".//img[@class='video-img video-img-lazy']").get_attribute("data-original")
            item["watch_num"] = li.find_element_by_xpath(".//span[@class='video-number']").text
            print(item)
            content_list.append(item)  # 將字典放入逐條新增到一個列表內

        # 獲取下一頁元素
        next_url = self.driver.find_elements_by_xpath("//a[@class='j-page-next']")
        # 確保獲取最後一頁的出現沒有下一頁時不會報錯，並在while迴圈中作為判別條件
        next_url = next_url[0] if len(next_url) > 0 else None  

        return content_list, next_url

    def save_content_list(self, content_list):
        with open("xiongmao.txt", "a", encoding="utf-8") as f:
            for content in content_list:
                f.write(json.dumps(content, ensure_ascii=False, indent=2))  # 使用json將資料以json格式寫入檔案
                f.write("\n")

    def run(self):  # 實現主要邏輯
        # 1.start_url
        # 2.傳送請求，獲取響應
        self.driver.get(self.start_url)
        # 3.提取資料
        content_list, next_url = self.get_content_list()
        # 4.儲存
        self.save_content_list(content_list)
        # 點選下一頁元素
        while next_url is not None:
            next_url.click() # 點選下一頁
            time.sleep(2)  # 睡2s是為了下一頁元素的載入緩衝時間，防止頁面元素還沒加載出來就去提取資料
            # 3.提取資料
            content_list, next_url = self.get_content_list()
            # 4.儲存
            self.save_content_list(content_list)


if __name__ == '__main__':
    xiongmao = XiongmaoSpider()
    xiongmao.run()

下面是匯出的json格式的檔案，我列舉l前三個資料：

{
  "name": "沐慈Kiki",
  "title": "Happy day香檳 啤酒抽獎",
  "room_img": "https://i.h2.pdim.gs/90/c0a8df56c6462e3882782f4fc22602ff/w338/h190.jpg",
  "watch_num": "1.3萬"
}
{
  "name": "芒果魚丶",
  "title": "韓服大師上王者",
  "room_img": "https://i.h2.pdim.gs/90/9c28dff6ad3d6bf1cef25e9062c0257e/w338/h190.jpg",
  "watch_num": "19.4萬"
}
{
  "name": "會旋轉的冬瓜丶",
  "title": "瓜式一刀流 開斬！",
  "room_img": "https://i.h2.pdim.gs/90/9fdc00fe5a2b9252765468ff2cd533dd/w338/h190.jpg",
  "watch_num": "18.0萬"
}
...
...

使用seleinum模組爬取熊貓直播平臺全部的主播房間。

下面我就直接放全部程式碼，主要地方我都有註釋，就不一一在程式碼外寫出來了： # author: aspiring from selenium import webdriver import time import json class XiongmaoSp

python爬蟲爬取各大平臺女主播圖片

目標: 各大直播平臺~~~(虎牙,熊貓,鬥魚,全民),內的女主播直播封面圖片. 所需掌握知識: re正則表示式的,os模組,urllib模組剛剛將這幾個平臺的顏值區域女主播都爬了一遍,整體來說步驟大致相同,我們這裡就拿”虎牙直播”來做個示範,看懂之後,可以先去嘗試爬取”

原生爬蟲（爬取熊貓直播人氣主播排名）

show () 字節碼 content see http color open span ‘‘‘‘ This is a module ‘‘‘ import re from urllib import request # 斷點調試 class Spider()

使用scrapy爬取手機版鬥魚主播的房間圖片及昵稱

發現對手 std pipeline obj ted += 指定 foo 目的：通過fiddler在電腦上對手機版鬥魚主播進行抓包，爬取所有主播的昵稱和圖片鏈接關於使用fiddler抓取手機包的設置：把手機和裝有fiddler的電腦處在同一個網段（同一個wifi），手機

一個爬取52破解的全部帖子地址的簡單爬蟲

軟件調試 ict print __main__ 逆向慶典活動 exception requests 總頁數 1 # -*- coding:utf-8 -*- 2 import requests 3 from bs4 import BeautifulSou

python實戰之原生爬蟲(爬取熊貓主播排行榜)

ref png ret spider find end mod int tps """ this is a module,多行註釋 """ import re from urllib import request # BeautifulSoup:解析數據結構推薦庫

python爬蟲：爬取鏈家深圳全部二手房的詳細信息

data sts rip 二手房 lse area 列表 dom bubuko 1、問題描述：爬取鏈家深圳全部二手房的詳細信息，並將爬取的數據存儲到CSV文件中 2、思路分析: (1)目標網址：https://sz.lianjia.com/ershoufang/ (2

pyhton爬蟲爬取電商平臺商品歷史價格、最低價格（慢慢買網）

主要使用的庫： requests:爬蟲請求並獲取原始碼 re：使用正則表示式提取資料 json:使用JSON提取資料 pandas：使用pandans儲存資料 #!coding=utf-8 import requests import os import re import

爬取鬥魚平臺

知識點： 1.運用selenium自動化驅動模組 2.find_elements_by_xpath（）與fin_element_by_xpath（）的區別，以及對元素的定位，內容的提取 3.獲取請求下一頁方法，注：time.sleep() 程式碼： #encoding=utf-8

用面向物件的思想程式設計思想使用requests、lxml模組爬取酷我音樂榜單的音樂，並用json格式匯出檔案。

首先匯入響應的模組： import requests from lxml import etree import json 然後新建一個class類，並建立需要的例項： class KuwoSpider: def __init__(self):

python爬蟲3——爬取騰訊招聘全部招聘資訊

python爬蟲2中，已經有了初步的程式碼，之後做了優化增加了工作職責、工作要求：獲取的資料有：程式碼如下： #!/usr/bin/env python # -*- coding:utf-8 -*- from bs4 import BeautifulS

Python網路爬蟲（九）：爬取頂點小說網站全部小說，並存入MongoDB

前言：本篇部落格將爬取頂點小說網站全部小說、涉及到的問題有：Scrapy架構、斷點續傳問題、Mongodb資料庫相關操作。背景： Python版本：Anaconda3 執行平臺：Windows IDE：PyCharm 資料庫：MongoDB 瀏

htmlparse的簡單使用--------爬取電影網頁的全部下載連結

1前期準備，下載htmlparse壓縮包並配置到eclipse上，到下面網址可以下載 1、這裡先分析與獲取一個電影介紹頁面的內容現在我們來檢視網頁原始碼好、我們現在先來獲取一個頁面的下載連結 /** * 獲取一個頁面的下

Python爬蟲——利用requests模組爬取妹子圖

近期學了下python爬蟲，利用requests模組爬取了妹子圖上的圖片，給單身狗們發波福利，哈哈！順便記錄一下第一次發部落格。話不多說，進入正題開發環境 python 3.6 涉及到的庫 requests lxml 先上一波爬取的截圖

python：爬取58同城全部二手商品資訊（轉轉網）

python_58ershou python+beautifulsoup多執行緒爬取58同城二手全部商品資訊，並在jupyter上將資料視覺化專案主程式在58_index中：建立mango資料庫表 #連線MongoDB資料庫 client

python 爬取某音樂平臺所有歌單資訊

# coding: utf-8 import requests import os from lxml import etree import json from spider_project.proxies import proxies import random cl

爬取網易財經全部A股上市公司年報

首先要找到所有A股上市公司的股票程式碼，將東方財富網列表中所有的股票的程式碼（6位數字號）取下來 <a target="_blank" href="http://quote.eastmoney.com/sh500001.html">基金金泰(500001

scrapy+ selenium的小案例兩則，爬取食品藥品監管和twitter使用者資料。

環境：python 3.6 scrapy selenium chrome chrome-driver windows 10 如何安裝python selenium 和對應谷歌版本的chrome-driver請自行在csdn中搜索。已經有很多大手子做過很詳細

python3.6爬蟲案例：爬取某網站所有PPT（下）。

上篇部落格：python3.6爬蟲案例：爬取某網站所有PPT（上）給大家介紹了爬取（http://www.1ppt.com/）網站中的ppt檔案，爬下來的檔案如下：所以，我們就要考慮將其名稱修改為其在網頁中顯示的名字，並將其批量解壓到指定資料夾。一、批量修改壓縮檔名稱。細心的

直播一對一原始碼主播美顏SDK程式碼分享

這是一篇關於直播一對一原始碼主播美顏SDK程式碼分析，僅供碼農參考。 /** Created by cxf on 2017/9/1. 直播一對一原始碼主播直播間美顏,濾鏡等效果 */ case BeautySettingPannel.BEAUTYPARAM_EXP

使用seleinum模組爬取熊貓直播平臺全部的主播房間。

相關推薦