python爬蟲系列(3.6-css選擇器)
章節是繼上一小節的知識點,只是本人把它拆分了,如果你對css比較熟悉的話,可以直接使用這一章節的選擇器
1、直接獲取元素節點
print(soup.select('a'))
2、根據類名查詢,比如要查詢class=sister的標籤
print(soup.select('.sister'))
3、根據id查詢
print(soup.select("#link1"))
4、多條件查詢
print(soup.select("p #link1")) # 查詢p標籤且是帶id="link1"
5、查詢子節點
print(soup.select("head > title"))
6、通過屬性查詢
print(soup.select('a[href="xx"]'))
注意使用select選擇的節點返回的都是list
soup = BeautifulSoup(html_doc, 'lxml')
position = []
trs = soup.select('tr')
for tr in trs:
tds = tr.select('td')
post = {}
title = tds[0].select('a')[0].get_text()
type = tds[1].get_text()
num = tds[2].get_text()
city = tds[3].get_text()
public_time = tds[4].get_text()
post['title'] = title
post['type'] = type
post['num'] = num
post['city'] = city
post['public_time'] = public_time
position.append(post)
print(position)