爬蟲大作業之廣商足球快訊(爬取足球新聞)

阿新 • • 發佈：2018-04-28

描述 brush slist white mat 完整 tps num pat

1.選一個自己感興趣的主題（所有人不能雷同）。

主題:爬取足球新聞相關信息

2.用python 編寫爬蟲程序，從網絡上爬取相關主題的數據。

技術分享圖片

3.對爬了的數據進行文本分析，生成詞雲。

txt

技術分享圖片

詞雲:

技術分享圖片

4.對文本分析結果進行解釋說明。

def getNewsDetail(Url):
    resd = requests.get(Url)
    resd.encoding = ‘utf-8‘
    soupd = BeautifulSoup(resd.text, ‘html.parser‘)     #打開新聞詳情並解析
    news = {}
    news[‘廣商好波‘] = soupd.select(‘.headline‘)[0].text.rstrip().replace("\r\n"," ")
    # info = soupd.select(‘.artical-info‘)[0].text.replace("\r\n"," ")
    # news[‘內容‘] = soupd.select(‘.artical-main-content‘)[0].text.strip().replace("\r\n"," ")
    print(news)
    return (news);

文本內容通過對新聞網站的某個球隊的新聞爬取，分別有標題、來源、內容等。標題作為詞雲的關鍵詞。

5.寫一篇完整的博客，描述上述實現過程、遇到的問題及解決辦法、數據分析思想及結論。

問題1:一開始的時候，代碼沒有包裝方法，只是長篇的寫，寫著寫著發現亂了。解決方法:重新梳理寫了的代碼，一步步的包裝。

問題2:在txt文件的讀取和錄入的時候，發現使用utf亂碼，後來通過請教其他同學，還是完成了。

f=open(‘pynews.txt‘,‘r‘,encoding=‘GBK‘).read()

數據分析思想及結論:本次研究的數據，我認為歐洲的豪門球隊可以爭取中國的足球商業市場。通過數據的分析:大部分歐洲的頂級球隊(新聞占據新聞主頁的7成)受到中國球迷的青睞。假如這些球隊來到中國踢友誼賽，或者與國足相互切磋，提升國足實力之余又能打開中國的球衣銷售市場。

6.最後提交爬取的全部數據、爬蟲及數據分析源代碼。

from urllib import request

import numpy as np
import requests
import re

from PIL import Image
from bs4 import BeautifulSoup
from datetime import datetime
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt


def getNewsDetail(Url):
    resd = requests.get(Url)
    resd.encoding = ‘utf-8‘
    soupd = BeautifulSoup(resd.text, ‘html.parser‘)     #打開新聞詳情並解析
    news = {}
    news[‘廣商好波‘] = soupd.select(‘.headline‘)[0].text.rstrip().replace("\r\n"," ")
    # info = soupd.select(‘.artical-info‘)[0].text.replace("\r\n"," ")
    # news[‘內容‘] = soupd.select(‘.artical-main-content‘)[0].text.strip().replace("\r\n"," ")
    print(news)
    return (news);
newslist = []
def getListPage(newsUrl):   #9. 取出一個新聞列表頁的全部新聞 包裝成函數def getListPage(pageUrl)
    res = requests.get(newsUrl)
    res.encoding = ‘utf-8‘
    soup = BeautifulSoup(res.text, ‘html.parser‘)
    for news in soup.select(‘.england-cat-grid-r2 ul li‘):
     Url = news.select(‘a‘)[0].attrs[‘href‘]
    # filmslist.append(getFilmsDetail(Url))
     print(Url)
     newslist.append(getNewsDetail(Url))
    return (newslist)
    # print(res.text)


newstotal = []
firstPageUrl=‘https://soccer.hupu.com/spain/‘
newstotal.extend(getListPage(firstPageUrl))
f = open(‘pynews.txt‘, ‘w‘, encoding=‘utf-8‘)
txtName = "pynews.txt"
f = open(txtName, "a+")
f.write(str(newstotal))
f.close()
for news in newstotal:
 print(news)

f=open(‘pynews.txt‘,‘r‘,encoding=‘GBK‘).read()
font=r‘C:\Windows\Fonts\simkai.ttf‘
a=np.array(Image.open("pdd.jpg"))
wordcloud=WordCloud( background_color="white",font_path=font,width=1000,height=860,mask=a,margin=2).generate(f)
imagecolor=ImageColorGenerator(a)
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
wordcloud.to_file(‘1.jpg‘)

爬蟲大作業之廣商足球快訊(爬取足球新聞)

描述 brush slist white mat 完整 tps num pat 1.選一個自己感興趣的主題（所有人不能雷同）。主題:爬取足球新聞相關信息 2.用python 編寫爬蟲程序，從網絡上爬取相關主題的數據。 3.對爬了的數據進行文本分析，生成詞雲。 txt

爬蟲大作業之廣商足球快訊(爬取足球新聞)

爬蟲大作業之廣商足球快訊(爬取足球新聞)

爬蟲大作業

爬蟲大作業－爬區a9vg電玩部落ps4專區

大作業之zabbix

python正則表示式大作業之模擬計算器(29行程式碼)

大作業之中文文字分類（終稿）

最課程階段大作業之01：使用SVN實現版本控制

專案大作業之日記本系統2

java大作業之拼圖遊戲

Python爬蟲系列之百度貼吧爬取

Python爬蟲實戰之Requests+正則表示式爬取貓眼電影Top100

Python爬蟲學習之正則表達式爬取個人博客

Python爬蟲入門教程 8-100 蜂鳥網圖片爬取之三

Python爬蟲系列之四：利用Python爬取PyODPS頁面並整合成PDF文件

網路爬蟲之Scrapy實戰二：爬取多個網頁

python爬蟲之反爬蟲情況下的煎蛋網圖片爬取初步探索

爬蟲學習之14：多程序爬取簡書社會熱點資料儲存到mongodb

Python爬蟲小實踐：尋找失蹤人口，爬取失蹤兒童信息並寫成csv文件，方便存入數據庫

分布式爬蟲系統設計、實現與實戰：爬取京東、蘇寧易購全網手機商品數據+MySQL、HBase存儲

python爬取足球比賽賽程筆記

爬蟲大作業之廣商足球快訊(爬取足球新聞)

相關推薦