1. 程式人生 > >抓取小豬短租1000張列表頁內容

抓取小豬短租1000張列表頁內容

pre quest 個數 import rom lxml zip .text with

代碼如下

#!/usr/bin/env python
# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup
import requests


def get_page_within(pages):
for page in range(1, pages+1):
wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))
soup = BeautifulSoup(wb.text, ‘lxml‘)
titles = soup.select(‘span.result_title‘)
prices = soup.select(‘span.result_price > i‘)
for title, price in zip(titles, prices):
date = {
‘title‘: title.get_text(),
‘price‘: price.get_text()
}
print(date)
get_page_within(pages=1000)
針對代碼解釋下
from bs4 import BeautifulSoup
import requests
引入beautifulsoup和requests兩個庫
def get_page_within(pages):
構建def函數意思是獲取pages張頁面的數據
for page in range(1, pages+1):
以1為起點循環pages+1個數

wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))
通過.famate讓括號內的數切換並且通過for循環和request庫解析pages個網址的內容
soup = BeautifulSoup(wb.text, ‘lxml‘)
通過beautifulsoup庫解析網頁內數據

titles = soup.select(‘span.result_title‘)
prices = soup.select(‘span.result_price > i‘)
選取title和prices數據
        for title, price in zip(titles, prices):
date = {
‘title‘: title.get_text(),
‘price‘: price.get_text()
}
print(date)
將獲得的內容裝到字典裏並打印
get_page_within(pages=1000)
給def一個值運行def函數








抓取小豬短租1000張列表頁內容