Python爬取網頁資料並匯入表格

阿新 • • 發佈：2018-12-18

import requests
import time
import random
import socket
import http.client
from bs4 import BeautifulSoup
import csv

def getContent(url , data = None):
    header={
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, sdch',
        'Accept-Language': 'zh-CN,zh;q=0.8',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.235'
    } # request 的請求頭
    timeout = random.choice(range(80, 180))
    while True:
        try:
            rep = requests.get(url,headers = header,timeout = timeout) #請求url地址，獲得返回 response 資訊
            rep.encoding = 'utf-8'
            break
        except socket.timeout as e: # 以下都是異常處理
            print( '3:', e)
            time.sleep(random.choice(range(8,15)))

        except socket.error as e:
            print( '4:', e)
            time.sleep(random.choice(range(20, 60)))

        except http.client.BadStatusLine as e:
            print( '5:', e)
            time.sleep(random.choice(range(30, 80)))

        except http.client.IncompleteRead as e:
            print( '6:', e)
            time.sleep(random.choice(range(5, 15)))
    print('request success')
    return rep.text # 返回的 Html 全文

if __name__ == '__main__':
    url ='http://wsb.wuhan.gov.cn/html/friendly/201602/t20160203_45633.shtml'
    html = getContent(url) # 呼叫獲取網頁資訊
    print('my frist python file')

def getData(html_text):
    final = []
    bs = BeautifulSoup(html_text, "html.parser")  # 建立BeautifulSoup物件
    body = bs.body #獲取body
    href = body.find('div',{'id': 'nav'})
    ul = href.find('ul')
    li = ul.find_all('li')

    for nav in li:
        temp = []
        href = nav.find('h1').string
        temp.append(href)
        inf = nav.find_all('p')
        weather = inf[0].string  # 天氣
        temp.append(weather)
        temperature_highest = inf[1].find('span').string  # 最高溫度,夜間可能沒有這個元素，需要注意
        temperature_low = inf[1].find('i').string  # 最低溫度
        temp.append(temperature_low)
        temp.append(temperature_highest)
    final.append(temp)
    print('getDate success')
    return final


if __name__ == '__main__':
    url ='http://wsb.wuhan.gov.cn/html/friendly/201602/t20160203_45633.shtml'
    html = getContent(url)    # 獲取網頁資訊
    result = getData(html)  # 解析網頁資訊，拿到需要的資料
    print('my frist python file')

def writeData(data, name):
    with open(name, 'a', errors='ignore', newline='') as f:
        f_csv =csv.writer(f)
        f_csv.writerows(data)
    print('write_csv success')

if __name__ == '__main__':
        url = 'http://www.weather.com.cn/weather/101210101.shtml'
        html = getContent(url)  # 獲取網頁資訊
        result = getData(html)  # 解析網頁資訊，拿到需要的資料
        writeData(result, 'E:\地理國情監測\e.csv')  # 資料寫入到 csv文件中
        print('my frist python file')

報錯：

C:\Users\jpy\PycharmProjects\venv\Scripts\python.exe C:/Users/jpy/PycharmProjects/test1.py request success my frist python file request success Traceback (most recent call last): File "C:/Users/jpy/PycharmProjects/test1.py", line 73, in <module> result = getData(html) # 解析網頁資訊，拿到需要的資料 File "C:/Users/jpy/PycharmProjects/test1.py", line 56, in getData href = nav.find('h1').string AttributeError: 'NoneType' object has no attribute 'string'

Process finished with exit code 1

Python爬取網頁資料並匯入表格

import requests import time import random import socket import http.client from bs4 import BeautifulSoup import csv def getContent(url

python爬取的資料如何匯入excel---以噹噹網為例

一、相關模組的下載與安裝（一）首先需要幾個模組，xlrd（下載地址為：https://pypi.org/project/xlrd/#files），xlwt（下載地址為：https://pypi.python.org/pypi）。現在以xlrd的安裝為例。 cmd進入x

如何通過jsoup網路爬蟲工具爬取網頁資料,並通過jxl工具匯出到excel

1：閒話少說,直接看需求: 抓取的url:http://www.shparking.cn/index.php/welcome/municipal_parking?key=&per_page=. 參考的資料:http://blog.csdn.net/lmj6235

python爬取網頁資料

前言：轉載請註明出處。注意事項：請於作者下載的版本保持一致如有細節不瞭解可對比參考python基礎教程：例如系統不一致（Windows/Unix/Linux）環境：我的電腦是windows系統64位，如有不同可根據電腦系統自行選擇合適的版本下載我用的是Pych

第十講：Python爬取網頁圖片並儲存到本地，包含次層頁面

上一講我們講到了從暱圖網的首頁下載圖片到本地，但是我們發現首頁上面的大部分連結其實都可以進入到二級頁面。在二級頁面裡面，我們也

Selenium學習三——利用Python爬取網頁表格資料並存到excel

利用Python爬取網頁表格資料並存到excel 1、具體要求：讀取教務系統上自己的成績單，並儲存到本地的excel中 2、技術要求：利用Selenium+Python獲取網頁，自動登陸並操作到成績單頁面通過xlwt模組，將表格儲存到本地excel （其中xlwt

Selenium學習四——利用Python爬取網頁多個頁面的表格資料並存到已有的excel中

利用Python爬取網頁多個頁面的表格資料並存到已有的excel中 1、具體要求獲取牛客網->題庫->線上程式設計->劍指Offer網頁，獲取表格中的全部題目，儲存到本地excel中 2、技術要求利用Selenium+Python獲取網頁，操

Python爬取網頁的圖片資料

本案例是基於PyCharm開發的，也可以使用idea。在專案內新建一個python檔案TestCrawlers.py TestCrawlers.py # 匯入urllib下的request模組 import urllib.request # 匯入正則匹配包 import re

你以為Python爬蟲只能爬取網頁資料嗎？APP也是可以的呢！

摘要大多數APP裡面返回的是json格式資料，或者一堆加密過的資料。這裡以超級課程表APP為例，抓取超級課程表裡使用者發的話題。 1 抓取APP資料包方法詳細可以參考這篇博文：http://my.oschina.net/jhao104/blog/605963 得到超級課程表

爬蟲——爬取網頁資料存入表格

最近由於個人需要，從相關書籍以及網上資料進行爬蟲自學，目標網址為http://mzj.beijing.gov.cn，對其內容進行整理篩選，存入excel格式。首先是對錶格的內容進行設定，編碼格式定義為utf-8，新增一個sheet的表格，其中head為表頭的內容，定義之後，利用sheet.wr

python初學-爬取網頁資料

python初學-爬取網頁資料 1,獲取網頁原始碼 import urllib url = 'http://www.163.com' wp = urllib.urlopen(url) file_content = wp.read() print file_content 2,

利用Python爬取房產資料！並在地圖上顯示！Python乃蒂花之秀！

JiwuspiderSpider.py # -*- coding: utf-8 -*- from scrapy import Spider,Request import re from jiwu.items import JiwuItem clas

python爬蟲小試例項--爬取網頁圖片並下載

一、python安裝在python的官網下載python版本，需要下載對應版本（在計算機-屬性中檢視自己是32位作業系統還是64位作業系統），我是64位的，就下載64位對應的安裝包了（如下圖：Windows x86-64 executable installer）。官網下載地

Python爬蟲爬取網頁資料並存儲（一）

環境搭建 1.需要事先安裝anaconda（或Python3.7）和pycharm *anaconda可在中科大映象下下載較快 2.安裝中遇到的問題： *anaconda（記得安裝過程中點新增路徑到path裡，沒新增的話手動新增：計算機右鍵屬性——高階系統設

不會Python爬蟲？教你一個通用爬蟲思路輕鬆爬取網頁資料

前言其實爬蟲的思路很簡單，但是對於很多初學者而言，看得懂，但是自己寫的時候就不知道怎麼去分析了！說實話還是寫少了，自己不要老是抄程式碼，多動手！本人對於Python學習建立了一個小小的學習圈子，為各位提供了一個平臺，大家一起來討論學習Python。歡迎各位

python爬取歌曲評論並進行資料視覺化

一、抓資料要想做成詞雲圖表，首先得有資料才行。於是需要一點點的爬蟲技巧。基本思路為：抓包分析、加密資訊處理、抓取熱門評論資訊 1.抓包分析我們首先用瀏覽器開啟網易雲音樂的網頁版，進入薛之謙《摩天大樓》歌曲頁面，可以看到下面有評論。接著F12進入開發者控制檯（審查

python 爬取指定圖片並將圖片下載到指定資料夾

""" Version 1.1.0 Author lkk Email [email protected] date 2018-10-19 11:34 DESC 下載指定網頁的圖片到指定資料夾

簡單的python爬取網頁字串內容並儲存

最近想試試python的爬蟲庫，就找了個只有字串的的網頁來爬取。網址如下：開啟後看到是一些歌名還有hash等資訊。按照hash|filename的方式存在檔案裡，先貼程式碼 #coding=utf-8 import urllib import re import

Python爬蟲 BeautifulSoup抓取網頁資料並儲存到資料庫MySQL

最近剛學習Python，做了個簡單的爬蟲，作為一個簡單的demo希望幫助和我一樣的初學者程式碼使用python2.7做的爬蟲抓取51job上面的職位名，公司名，薪資，釋出時間等等直接上程式碼，程式碼中註釋還算比較清楚，沒有安裝mysql需要遮蔽掉相關程式碼：#!/u

python爬取網頁圖片

ima com col list https pytho 表達式 images 5% 在Python中使用正則表達式，一個小小的爬蟲，抓取百科詞條網頁的jpg圖片。下面就是我的代碼，作為參考： #coding=utf-8 # __author__ = ‘Hinfa‘ im

Python爬取網頁資料並匯入表格

相關推薦