1. 程式人生 > >網路爬蟲:Requests+lxml

網路爬蟲:Requests+lxml

比較常用

# -*-coding:utf8-*-
import requests
from lxml import etree

url="http://econpy.pythonanywhere.com/ex/001.html"
page=requests.get(url)
html=page.text
selector = etree.HTML(html)

buyer=selector.xpath('//div[@title="buyer-name"]/text()')
prices=selector.xpath('//span[@class="item-price"]/text()')

print (buyer)
print
(prices)

這個用的少一些

# -*-coding:utf8-*-

import requests
from lxml import html

url="http://econpy.pythonanywhere.com/ex/001.html"
page=requests.get(url)
tree=html.fromstring(page.text)

buyer=tree.xpath('//div[@title="buyer-name"]/text()')
prices=tree.xpath('//span[@class="item-price"]/text()')

print (buyer)
print
(prices)

但是遇到中文網頁時,中文出現亂碼。

req = requests.get("http://news.sina.com.cn/")
print (req.text)