1. 程式人生 > >03、書店尋寶(二)

03、書店尋寶(二)

little port 裏的 gre there bonding 組成 primary bsp

題目要求:你需要爬取的是網上書店Books to ScrapeTravel這類書中,所有書的書名、評分、價格三種信息,並且打印提取到的信息。 網頁URL:http://books.toscrape.com/catalogue/category/books/travel_2/index.html
 1 #3、書店尋寶(二)
 2 #    題目要求:你需要爬取的是網上書店Books to ScrapeTravel這類書中,所有書的書名、評分、價格三種信息,並且打印提取到的信息。
 3 #    網頁URL:http://books.toscrape.com/catalogue/category/books/travel_2/index.html
4 5 import requests 6 from bs4 import BeautifulSoup 7 res = requests.get(http://books.toscrape.com/catalogue/category/books/travel_2/index.html) 8 html = res.text 9 soup = BeautifulSoup(html,html.parser) 10 items = soup.find_all(article,class_=product_pod) 11 for item in items: 12 print
(item.find(h3).find(a)[title]+\t+item.find(p)[class][1],\t,item.find(p,class_=price_color).text) 13 # print(item.find(‘h3‘).find(‘a‘)[‘title‘]) 14 # print(item.find(‘p‘)[‘class‘][1]) 15 # print(item.find(‘p‘,class_=‘price_color‘).text) 16 17 18 ‘‘‘ 19 執行結果如下: 20 It‘s Only the Himalayas Two £45.17
21 Full Moon over Noahâs Ark: An Odyssey to Mount Ararat and Beyond Four £49.43 22 See America: A Celebration of Our National Parks & Treasured Sites Three £48.87 23 Vagabonding: An Uncommon Guide to the Art of Long-Term World Travel Two £36.94 24 Under the Tuscan Sun Three £37.33 25 A Summer In Europe Two £44.34 26 The Great Railway Bazaar One £30.54 27 A Year in Provence (Provence #1) Four £56.88 28 The Road to Little Dribbling: Adventures of an American in Britain (Notes From a Small Island #2) One £23.21 29 Neither Here nor There: Travels in Europe Three £38.95 30 1,000 Places to See Before You Die Five £26.08 31 ‘‘‘ 32 33 ‘‘‘ 34 老師的代碼 35 36 import requests 37 from bs4 import BeautifulSoup 38 39 res_bookstore = requests.get(‘http://books.toscrape.com/catalogue/category/books/travel_2/index.html‘) 40 bs_bookstore = BeautifulSoup(res_bookstore.text,‘html.parser‘) 41 list_books = bs_bookstore.find_all(class_=‘product_pod‘) 42 for tag_books in list_books: 43 # 找到a標簽需要提取兩次 44 tag_name = tag_books.find(‘h3‘).find(‘a‘) 45 # 這個p標簽的class屬性有兩種:"star-rating",以及具體的幾星比如"Two"。我們選擇所有書都有的class屬性:"star-rating" 46 list_star = tag_books.find(‘p‘,class_="star-rating") 47 # 價格比較好找,根據屬性提取,或者標簽與屬性一起都可以 48 tag_price = tag_books.find(‘p‘,class_="price_color") 49 # 這裏用到了tag[‘屬性名‘]提取屬性值 50 print(tag_name[‘title‘]) 51 # 同樣是用屬性名提取屬性值 52 print(‘star-rating:‘,list_star[‘class‘][1]) 53 # 用list_star[‘class‘]提取出來之後是一個由兩個值組成的列表,如:"[‘star-rating‘, ‘Two‘]",我們最終要提取的是這個列表的第1個值:"Two"。 54 # 為什麽是列表呢?因為這裏的class屬性有兩個值。其實,在這個過程中,我們是使用class屬性的第一個值提取出了第二個值。 55 # 打印的時候,我加上了換行,為了讓數據更加清晰地分隔開,當然你也可以不加。</code></pre> 56 print(‘Price:‘,tag_price.text, end=‘\n‘+‘------‘+‘\n‘) 57 ‘‘‘

items中每個Tag的內容如下
 1 <article class="product_pod">
 2     <div class="image_container">
 3         <a href="../../../its-only-the-himalayas_981/index.html"><img alt="It‘s Only the Himalayas" class="thumbnail"
 4                 src="../../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg" /></a>
 5     </div>
 6     <p class="star-rating Two">
 7         <i class="icon-star"></i>
 8         <i class="icon-star"></i>
 9         <i class="icon-star"></i>
10         <i class="icon-star"></i>
11         <i class="icon-star"></i>
12     </p>
13     <h3><a href="../../../its-only-the-himalayas_981/index.html" title="It‘s Only the Himalayas">It‘s Only the
14             Himalayas</a></h3>
15     <div class="product_price">
16         <p class="price_color">£45.17</p>
17         <p class="instock availability">
18             <i class="icon-ok"></i>
19 
20 
21             In stock
22 
23 
24         </p>
25         <form>
26             <button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
27         </form>
28     </div>
29 </article>

03、書店尋寶(二)