【Python3 爬蟲學習筆記】解析庫的使用 4 —— Beautiful Soup 2

阿新 • • 發佈：2018-11-11

父節點和祖先節點

如果要獲取某個節點元素的父節點，可以呼叫parent屬性：

html = """
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<p class="story">
			Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">
<span>Elsie</span>
</a>
</p>
<p class="story">...</p>
""" 

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.a.parent)

執行結果如下：

<p class="story">
            Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span> 

</a>
</p>

這裡我們選擇的是第一個a節點的父節點元素。很明顯，它的父節點是p節點，輸出結果便是p節點及其內部的內容。
需要注意的是，這裡輸出的僅僅是a節點的直接父節點，而沒有再向外尋找父節點的祖先節點。如果想獲取所有的祖先節點，可以呼叫parents屬性：

html = """
<html>
<body>
<p class="story">
<a href="http://example.com/elsie" class="sister" id="link1">
<span>Elsie</span>
</a>
</p>
""" 

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(type(soup.a.parents))
print(list(enumerate(soup.a.parents)))

執行結果如下：

<class 'generator'>
[(0, <p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>), (1, <body>
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>
</body>), (2, <html>
<body>
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>
</body></html>), (3, <html>
<body>
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>
</body></html>)]

可以發現，返回結果是生成器型別。這裡用列表輸出了它的索引和內容，而列表中的元素就是a節點的祖先節點。

兄弟節點

兄弟節點的獲取方式：

html = """
<html>
<body>
<p class="story">
		Once upon a time there were little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">
<span>Elsie</span>
</a>
		Hello
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
		and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
		and they lived at the bottom of a well.
</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print('Next Sibling', soup.a.next_sibling)
print('Prev Sibling', soup.a.previous_sibling)
print('Next Siblings', list(enumerate(soup.a.next_siblings)))
print('Prev Siblings', list(enumerate(soup.a.previous_siblings)))

執行結果如下：

Next Sibling
        Hello

Prev Sibling
        Once upon a time there were little sisters; and their names were

Next Siblings [(0, '\n        Hello\n'), (1, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>), (2, '\n        and\n'), (3, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>), (4, '\n        and they lived at the bottom of a well.\n')]
Prev Siblings [(0, '\n        Once upon a time there were little sisters; and their names were\n')]

可以看到，這裡呼叫了4個屬性，其中next_sibling和previous_sibling分別獲取節點的下一個和上一個兄弟元素，next_siblings和previous_siblings則分別返回所有前面和後面的兄弟節點的生成器。

【Python3 爬蟲學習筆記】解析庫的使用 4 —— Beautiful Soup 2

父節點和祖先節點

兄弟節點

【Python3 爬蟲學習筆記】解析庫的使用 3 —— Beautiful Soup 1

【Python3 爬蟲學習筆記】解析庫的使用 2 —— 使用XPath 2

【Python3 爬蟲學習筆記】解析庫的使用 1 —— 使用XPath 1

【Python3 爬蟲學習筆記】解析庫的使用 7 —— Beautiful Soup 5

【Python3 爬蟲學習筆記】解析庫的使用 5 —— Beautiful Soup 3

【Python3 爬蟲學習筆記】解析庫的使用 4 —— Beautiful Soup 2

【Python3 爬蟲學習筆記】解析庫的使用 10 —— 使用pyquery 3

【Python3 爬蟲學習筆記】解析庫的使用 9 —— 使用pyquery 2

【Python3 爬蟲學習筆記】解析庫的使用 8 —— 使用pyquery 1

【Python3 爬蟲學習筆記】解析庫的使用 11 —— 使用pyquery 4

【Python3 爬蟲學習筆記】解析庫的安裝

【Python3 爬蟲學習筆記】解析庫的使用 6 —— Beautiful Soup 4

【Python3 爬蟲學習筆記】基本庫的使用 8—— 正則表示式 1

【Python3 爬蟲學習筆記】基本庫的使用 7 —— 使用requests

【Python3 爬蟲學習筆記】基本庫的使用 12—— 正則表示式 5

【Python3 爬蟲學習筆記】基本庫的使用 11—— 正則表示式 4

【Python3 爬蟲學習筆記】基本庫的使用 10—— 正則表示式 3

【Python3 爬蟲學習筆記】基本庫的使用 9—— 正則表示式 2

【Python3 爬蟲學習筆記】基本庫的使用 13 —— 抓取貓眼電影排行

【Python3 爬蟲學習筆記】Web庫的安裝

【Python3 爬蟲學習筆記】解析庫的使用 4 —— Beautiful Soup 2

父節點和祖先節點

兄弟節點

相關推薦