1. 程式人生 > >coursera公開課——recommender system作業(第二週)

coursera公開課——recommender system作業(第二週)

寫這麼醜的程式碼我也是醉了,繼續學習。
第二週的assignment:

  1. Mean Rating: Calculate the mean rating for each movie, order with the highest rating listed first, and submit the top 5.
  2. % of ratings 4+: Calculate the percentage of ratings for each movie that are 4 or higher. Order with the highest percentage first, and submit the top 5.
  3. Rating Count: Count the number of ratings for each movie, order with the most number of ratings first, and submit the top 5.
  4. Top 5 Star Wars: Calculate movies that most often occur with Star Wars: Episode IV - A New Hope (1977) using the (x+y)/x method described in class. In other words, for each movie, calculate the percentage of Star Wars raters who also rated that movie. Order with the highest percentage first, and submit the top 5.
#coding:utf-8
import csv
#top n function
def topn(name,scores,n=5):
    tmpscores=scores[:] #create a new array
    tmpscores.sort()
    flag=[1 for i in range(len(name))] # flags 
    for i in range(n):
        for j in range(len(name)):
            if scores[j]==tmpscores[-1-i]:
                if
flag[j]: flag[j]=0 print name[j],scores[j] def caldiv(name,array1,array2): result=[0.0 for i in range(len(name))] for i in range(len(name)): if i!=0: result[i]=array1[i]*1.0/array2[i] return result star_level=4 csvfile=file('A1Ratings.csv','rU') reader=csv.reader(csvfile,dialect='excel') for line in reader: if reader.line_num==1: name=line scores=[0 for i in range(len(name))] totalcount=[0 for i in range(len(name))] star_count=[0 for i in range(len(name))] if reader.line_num!=1: for num in name: ff=name.index(num) if ff>0: temp=1 item=line[ff] if not item.strip(): # to solve the proble of "" item=0 temp=0 scores[ff]=scores[ff]+int(item) totalcount[ff]=totalcount[ff]+temp if int(item)>=star_level: star_count[ff]=star_count[ff]+1 average=caldiv(name,scores,totalcount) average1=caldiv(name,star_count,totalcount) topn(name,average) topn(name,average1) csvfile.close() csvfile=file('A1Ratings.csv','rU') reader=csv.reader(csvfile,dialect='excel') sit=1 count=[0.0 for i in range(len(name))] for line in reader: if reader.line_num!=1: for i in range(len(name)): if not line[i].strip(): line[i]=0 if i>0 and i!=sit: if int(line[sit])*int(line[i]): count[i]=count[i]+1.0/15 topn(name,count,5)

中間遇到的問題:
1. new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
initial code:
csvfile1=file(‘A1Ratings.csv’,’rb’)
update code:
csvfile1=file(‘A1Ratings.csv’,’rU’)
2. 問題程式碼如下:

for line in reader:
    print line
    if reader.line_num==1:
        name=line
        scores=[0 for i in range(len(name))]
        totalcount=[0 for i in range(len(name))]
    if reader.line_num!=1:
        for num in name:
            if name.index(num)>0:
                for item in line:
                    if name.index(num)==line.index(item):
                        temp=1
#       print item,num
                        if not item.strip():                
                            item=0
                            temp=0
                        scores[name.index(num)]=scores[name.index(num)]+int(item)
                        totalcount[name.index(num)]=totalcount[name.index(num)]+temp
print scores
print totalcount
csvfile.close()

打flag問題,如果沒有打flag,當有相同的分數時(如4分),會定位到第一個打4分的位置。