nltp APP-分析買家評論的評分-高頻詞:二維關系
阿新 • • 發佈:2017-05-11
dir yellow imp font direct let swe nco lec
w
# -*- coding: utf-8 -*- from nltk import * # TO FIX : No such file or directory os.chdir(r‘E:\zpy‘) f = open(‘reviews_text_lt_3.txt‘, ‘r‘) f_r = f.read() strList = f_r.split(‘ ‘) fdist1 = FreqDist(strList) #總的詞數 print fdist1 #表達式 keys()為我們提供了文本中所有不同類型的鏈表 vocabulary1 = fdist1.keys() #通過切片看看這個鏈表的前 50 項res0_50 =vocabulary1[:50] print res0_50
C:\>python E:\zpy\wltp.py <FreqDist with 16789 samples and 180043 outcomes> [‘‘, ‘raining‘, ‘disappointing.It‘, ‘uncomfortable...‘, "lot‘s", ‘uv.\nSo,‘, ‘yellow‘, ‘Seller‘, ‘four‘, ‘vaporizers.I‘, ‘Does‘, ‘completely!!‘, ‘hanging‘, ‘Monday,‘, ‘asap!!This‘, ‘Until‘, ‘instead.The‘, ‘malfunctioned.‘, ‘Lately‘, ‘looking‘, ‘LAST‘, ‘eligible‘, ‘electricity‘, ‘DISAPPOINTED‘, ‘oneWorks‘, ‘powdery‘, ‘unanswered‘, ‘also.‘, ‘refun ‘sooooo‘, ‘foul‘, ‘on\nafter‘, ‘fingers.‘, ‘advice:‘, ‘fingers,‘, ‘advice?‘, ‘each),‘, ‘month.I‘] C:\>
SELECT amz_review_textFROM amz_reviews_grab_us WHERE amz_review_rating < 3 LIMIT 3000;
對於通過亞馬遜us美國站的買家而言,在數據庫前3000條的時間周期y-m-d內,在不考慮品類、價格、評分相對值等因素的情況下,
暫得出以下推測:
0-賣品屬性為yellow,其他條件相同情況下,可能不受歡迎,評分相對低;
1-周一可能會給買家糟糕的購買體驗,周一的促銷活動須結合其他因素,如人文風俗、新聞事件慎重;
註:dev的當前視角
nltp APP-分析買家評論的評分-高頻詞:二維關系