1. 程式人生 > >利用BeautifulSoup去除HTML指定標籤和去除註釋

利用BeautifulSoup去除HTML指定標籤和去除註釋

去除指定標籤

from bs4 import BeautifulSoup
#去除屬性ul
[s.extract() for s in soup("ul")]
# 去除屬性svg
[s.extract() for s in soup("svg")]
# 去除屬性script
[s.extract() for s in soup("script")]

去除註釋

from bs4 import BeautifulSoup, Comment

 #去除註釋
comments = soup.findAll(text=lambda text: isinstance(text, Comment)
) [comment.extract() for comment in comments]