python爬蟲爬取ithome的新聞儲存到本地資料庫

阿新 • • 發佈：2019-01-04

爬IT之家首頁的新聞，並讀取每篇新聞，並將新聞存到本地資料庫。
效率不是很高，請求各位大神指點。
from bs4 import BeautifulSoup
import urllib.request import re import pymysql conn =pymysql.connect(host='localhost',user ='root',passwd='',db='myblog',charset='utf8') cur=conn.cursor() headers = ('User-Agent','Mozilla/5.0 (Windows NT 6.1)') opener = urllib.request.build_opener() opener.addheaders = [headers] data = opener.open(url).read() file=data.decode('gbk') title1='

http://www\.ithome\.com/html/[a-z]+/[0-9]*\.htm' a=re.findall(title1,file) for i in range(0,len(a)): urltest=urllib.request.urlopen(a[i]) strdata=urltest.read().decode('gbk') strdata=str(strdata) soup = BeautifulSoup(strdata,"html.parser") redemo="-"+" "+"[^\s]*" pattern=re.compile(redemo) rehtml="<[^<]*>" title=soup.h1.string title=str(title).encode('utf8') text = soup.find_all(class_="post_content") for string in text: content=string ad=soup.find_all(class_="yj_d") for adstring in ad: ad=adstring ad=str(ad) content=str(content) content=content.replace(ad,"") html=re.compile(rehtml) content=html.sub("\n",content) content=re.compile("\n+").sub(" ",content) content=content.encode('utf8') #content="content" sql="INSERT INTO `myblog`.`article` (`id`, `title`, `content`, `author`, `time`) VALUES (NULL,%s,%s, 'admin', '2016-03-08');" cur.execute(sql,[title,content]) conn.commit() conn.close()

python爬蟲爬取ithome的新聞儲存到本地資料庫

python爬蟲爬取ithome的新聞儲存到本地資料庫

★ Python爬蟲 - 爬取網頁文字資訊並儲存（美文的爬取與儲存）

python爬蟲——爬取豆瓣電影top250資訊並載入到MongoDB資料庫中

python爬蟲爬取圖片並儲存

[python爬蟲]爬取boss直聘並且存到Mysql資料庫裡

Django實戰: Python爬蟲爬取鏈家上海二手房資訊，存入資料庫並在前端顯示

Python爬蟲-爬取糗事百科段子

python爬蟲爬取頁面源碼在本頁面展示

python 爬蟲爬取證券之星網站

python爬蟲爬取海量病毒文件

用Python爬蟲爬取廣州大學教務系統的成績（內網訪問）

python爬蟲——爬取古詩詞

利用Python爬蟲爬取淘寶商品做數據挖掘分析實戰篇，超詳細教程

Python爬蟲 - 爬取百度html代碼前200行

簡易python爬蟲爬取boss直聘職位，並寫入excel

Python 爬蟲爬取微信文章

python爬蟲爬取QQ說說並且生成詞雲圖，回憶滿滿！

Python爬蟲爬取OA幸運飛艇平臺獲取數據

利用python爬蟲爬取圖片並且制作馬賽克拼圖

Python - 爬蟲爬取和登陸github

python爬蟲爬取ithome的新聞儲存到本地資料庫

相關推薦