1. 程式人生 > >【爬蟲例項1】python3下使用beautifulsoup爬取資料並存儲txt檔案

【爬蟲例項1】python3下使用beautifulsoup爬取資料並存儲txt檔案

1:執行環境:

python: 3.7.0
系統:Windows
IDE:pycharm 2017

2:需要安裝的庫:

requests 和 beautifulsoup

3:完整程式碼:

 # coding:utf-8
    import requests
    from bs4 import BeautifulSoup
    import  bs4
    
    
    def gethtml(url,headers):
        response =  requests.get(url,headers=headers)
        try:
            if response.status_code == 200:
                print('抓取成功網頁長度:',len(response.text))
                response.encoding = 'utf-8'
                return response.text
        except BaseException as e:
            print('抓取出現錯誤:',e)
    
    def getsoup(html):
        soup = BeautifulSoup(html,'lxml')
        for tr in soup.find('tbody').children:  #生成tr的tag列表
            if isinstance(tr,bs4.element.Tag):
                td = tr('td')          #迴圈獲取所有tr標籤下的td標籤,並生成tag列表
                t = [td[0].string, td[1].string,'    ',td[2].string,'   ',td[3].string]   #提取前四td字串
                list.append(t)
    
    def write_data(list):
       for i in list:   #迴圈提取list中的元素
        with open('daxue.txt','a') as  data:
                    print(i,file=data)          #寫入檔案
    
    
    if __name__ == '__main__':
        list = []
        url = 'http://www.zuihaodaxue.com/shengyuanzhiliangpaiming2018.html'
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
        }
        html = gethtml(url,headers)
        getsoup(html)
        write_data(list)

4:執行結果:

在這裡插入圖片描述
注:這只是一個學習的demo,寫的不是很精美,還有很多優化的地方,先弄懂原理然後慢慢磨練吧。