Python2.7 爬蟲實踐：豆瓣電影影評分析

阿新 • • 發佈：2019-01-17

#避免uncode出現在雲圖，所以我直接將結果儲存為str
for i in range(10):
num = i + 1
if pageNum >0 :
start =(num-1) * 20
else:
return False
reqUrl='https://movie.douban.com/subject/' + movieId + '/comments' +'?' +'start=' + str(start) + '&limit=20'
print reqUrl
resp = urlopen(reqUrl)
html = resp.read().decode('utf-8')
#print html
soup = bs4.BeautifulSoup(html,"html.parser")
comContent = soup.find_all('div',id='comments')
#print comContent
commentStr=comContent[0].find_all('div',class_ ='comment')
commentList =[]
comments=''
for comment in commentStr:
c = comment.find_all('p')[0].string
if c is not None:
commentList.append(c)
comments =comments+str(c).strip().strip('\n')
print c
return comments
if __name__ == '__main__':
print 'start ....'
title=u'殺破狼·貪狼'
movieId = getMovieId(title)
print 'movie id is:'
print movieId
comments = getCommentsById(movieId,10)
comments=comments.replace(' ','')
print comments
#使用正則表示式去除標點符號
pattern = re.compile(r'[\u4e00-\u9fa5]+')
filterdata = re.findall(pattern, comments)
cleaned_comments = ''.join(filterdata)
cleaned_comments= comments
#使用結巴分詞進行中文分詞
segment = jieba.lcut(cleaned_comments)
words_df=pd.DataFrame({'segment':segment})

#去掉停用詞 #如果有電影領域的停用詞就更好了
#stopwords=pd.read_csv('D:\python\stopwords.txt',index_col=False,sep="\t",names=['stopword'], encoding='utf-8')#quoting=3全不引用

Python2.7 爬蟲實踐：豆瓣電影影評分析

Python2.7 爬蟲實踐：豆瓣電影影評分析

零基礎Python爬蟲實戰：豆瓣電影TOP250

初試python爬蟲之：豆瓣電影爬蟲

案例學python——案例三：豆瓣電影資訊入庫一起學爬蟲——通過爬取豆瓣電影top250學習requests庫的使用

python實踐2——利用爬蟲抓取豆瓣電影TOP250資料及存入資料到MySQL資料庫

（7）Python爬蟲——爬取豆瓣電影Top250

Python爬蟲入門 | 7 分類爬取豆瓣電影，解決動態載入問題

Python爬蟲小案例：豆瓣電影TOP250

初學python：用簡單的爬蟲爬取豆瓣電影TOP250的排名

python爬蟲練習1:豆瓣電影TOP250

用Python爬蟲爬取豆瓣電影、讀書Top250並排序

Python2.7更新pip：UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position 7: ordinal not in range(128)

Python爬蟲實踐：獲取石家莊空氣質量歷史資料（13年至今）

Python爬蟲--- 1.5 爬蟲實踐：獲取百度貼吧內容

python2.7爬蟲例項詳細介紹之爬取大眾點評的資料

python爬蟲——爬取豆瓣電影top250資訊並載入到MongoDB資料庫中

python爬蟲之獲取豆瓣電影資訊

python2.7爬蟲實戰（房地產資訊抓取）

php爬蟲爬取豆瓣電影top250內容

Java爬蟲實踐：Jsoup+HttpUnit爬取今日頭條、網易、搜狐、鳳凰新聞

Python2.7 爬蟲實踐：豆瓣電影影評分析

相關推薦