1. 程式人生 > >《利用Python進行資料分析》筆記---第2章--MovieLens 1M資料集

《利用Python進行資料分析》筆記---第2章--MovieLens 1M資料集

寫在前面的話:

還有一定要說明的:

我使用的是Python2.7,書中的程式碼有一些有錯誤,我使用自己的2.7版本調通。

# coding: utf-8
import pandas as pd
unames = ['user_id','gender','age','occupation','zip']
users = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\users.dat', sep='::', header=None, names=unames)
rnmaes = ['user_id'
,'movie_id','rating','timestamp'] ratings = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\\ratings.dat', sep='::', header=None, names=rnmaes) mnames = ['movie_id','title','genres'] movies = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\movies.dat', sep='::', header=None
, names=mnames) users[:5] ratings[:5] movies[:5] ratings data = pd.merge(pd.merge(ratings, users), movies) data.ix[0] mean_rating = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean') mean_rating[:5] ratings_by_title = data.groupby('title').size() ratings_by_title[:10] active_titles = ratings_by_title.index[ratings_by_title >= 250
] active_titles mean_rating = mean_rating.ix[active_titles] mean_rating top_female_rating = mean_rating.sort_index(by='F', ascending=False) top_female_rating[:10] mean_rating['diff'] = mean_rating['M'] - mean_rating['F'] sorted_by_diff = mean_rating.sort_index(by='diff') sorted_by_diff[:15] sorted_by_diff[::-1][:15] ratings_std_by_title = data.groupby('title')['rating'].std() ratings_std_by_title = ratings_by_title.ix[active_titles] ratings_std_by_title.order(ascending=False)[:10] ratings_std_by_title