1. 程式人生 > >Python爬蟲-爬取鬥魚網頁selenium+bs

Python爬蟲-爬取鬥魚網頁selenium+bs

爬取鬥魚網頁(selenium+chromedriver得到網頁,用Beasutiful Soup提取資訊)

=============================

=================================

=======================================

#self.driver.page_source 得到頁面原始碼用 xml解析
soup = BeautifulSoup(self.driver.page_source, 'xml')

結果示例:

================================

 1 ''''
 2 任務:
 3 爬去鬥魚直播內容
 4 https://www.douyu.com/directory/all
 5 思路:
 6 1. 利用selenium得到頁面內容
 7 2. 利用xpath或者bs等在頁面中進行資訊提取
 8 '''
 9 
10 from selenium import webdriver
11 from bs4 import BeautifulSoup
12 
13 
14 class Douyu():
15     #初始化方法
16     def setUp(self):
17         self.driver = webdriver.Chrome()
18 self.url = 'https://www.douyu.com/directory/all' 19 20 21 def douyu(self): 22 self.driver.get(self.url) 23 24 while True: 25 soup = BeautifulSoup(self.driver.page_source, 'xml') 26 27 # 返回當前頁面所有放假標題列表和觀眾人數 28 titles = soup.find_all('h3
', {'class':'ellipsis'}) 29 nums = soup.find_all('span', {'class':'dy-num fr'}) 30 31 for title, num in zip(titles, nums): 32 print("房間{0} 總共觀賞人數{1}".format(title.get_text().strip(), num.get_text().strip())) 33 34 def destr(self): 35 self.driver.quit() 36 37 if __name__ == '__main__': 38 douyu = Douyu() 39 douyu.setUp() 40 douyu.douyu() 41 douyu.destr()