1. 程式人生 > >第七天學習python

第七天學習python

爬蟲學習(二)

首先是 beautifulsoup4庫的安裝,直接開啟命令提示符,輸入pip install beautifulsoup4
測試庫是否安裝
下面測試

import requests
r=requests.get("http://python123.io/ws/demo.html")
print(r.text)
demo=r.text
from bs4 import BeautifulSoup
soup =BeautifulSoup(demo,"html.parser")
print(soup.prettify())

BeautifulSoup是一個類
建立一個例項解析demo裡面的內容,以HTML解析
from bs4 import BeautifulSoup
soup =BeautifulSoup(demo,”html.parser”)
以上兩段程式碼即可解析
BeautifulSoup庫是解析,遍歷,維護的功能庫。他可以解析html,xml
下面介紹

import requests
r=requests.get("http://python123.io/ws/demo.html")
demo=r.text
from bs4 import BeautifulSoup
soup =BeautifulSoup(demo,"html.parser")
print(soup.title)
tag=soup.a
print(tag)
輸出為
<title>This is a python demo page</title>
<a class="py1" href="http://www.icourse163.org/course/BIT-268001"
id="link1">Basic Python</a>

上述程式碼的意思是
訪問上述網站,輸出soup.title 和a節點的內容

tag=soup.a.next_sibling.next_sibling
print(tag)
s=soup.a.previous_sibling
print(s)

next_sibling是遍歷下一個節點
previous_sibling是遍歷上一個節點
.contents是下行遍歷
.parent是上行遍歷
.prettify()能夠將html很好的輸出內容