【爬蟲例項1】python3下使用beautifulsoup爬取資料並存儲txt檔案
阿新 • • 發佈:2018-12-16
1:執行環境:
python: 3.7.0
系統:Windows
IDE:pycharm 2017
2:需要安裝的庫:
requests 和 beautifulsoup
3:完整程式碼:
# coding:utf-8 import requests from bs4 import BeautifulSoup import bs4 def gethtml(url,headers): response = requests.get(url,headers=headers) try: if response.status_code == 200: print('抓取成功網頁長度:',len(response.text)) response.encoding = 'utf-8' return response.text except BaseException as e: print('抓取出現錯誤:',e) def getsoup(html): soup = BeautifulSoup(html,'lxml') for tr in soup.find('tbody').children: #生成tr的tag列表 if isinstance(tr,bs4.element.Tag): td = tr('td') #迴圈獲取所有tr標籤下的td標籤,並生成tag列表 t = [td[0].string, td[1].string,' ',td[2].string,' ',td[3].string] #提取前四td字串 list.append(t) def write_data(list): for i in list: #迴圈提取list中的元素 with open('daxue.txt','a') as data: print(i,file=data) #寫入檔案 if __name__ == '__main__': list = [] url = 'http://www.zuihaodaxue.com/shengyuanzhiliangpaiming2018.html' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36' } html = gethtml(url,headers) getsoup(html) write_data(list)
4:執行結果:
注:這只是一個學習的demo,寫的不是很精美,還有很多優化的地方,先弄懂原理然後慢慢磨練吧。