Python之協同過濾(尋找相近的使用者)
阿新 • • 發佈:2018-12-30
資料內容是人們對不同電影的評價:我們通過計算人與人之間評價電影的相關度來找到口味相同的人,根據口味相同的人來推薦可能喜歡的電影。
資料如下:
critics={'lisa rose':{'lady in the Water':2.5,'snakes on a plane':3.5,'just my luck':3.0,'superman returns':3.5,
'you ,me and dupree':2.5,'the night listener':3.0},
'gene seymour':{'lady in the Water':3.0 ,'snakes on a plane':3.5,'just my luck':1.5,'superman returns':5.0,
'you ,me and dupree':3.5,'the night listener':3.0},
'michael phillips':{'lady in the Water':2.5,'snakes on a plane':3.0,'superman returns':3.5,
'the night listener':4.0},
'claudia puig':{'snakes on a plane':3.5,'just my luck':3.0,'superman returns':4.0,
'you ,me and dupree':2.5,'the night listener':4.5},
'mick lasalle':{'lady in the Water':3.0,'snakes on a plane':4.0,'just my luck':2.0,'superman returns':3.0,
'you ,me and dupree':2.0,'the night listener':3.0 },
'jack mattews':{'lady in the Water':3.0,'snakes on a plane':4.0,'superman returns':5.0,
'you ,me and dupree':3.5,'the night listener':3.0},
'toby':{'snakes on a plane':4.5,'superman returns':4.0,'you ,me and dupree':1.0}}
兩種計算距離的方法:
from math import sqrt
def sim_distnace(prefs,persion1,persion2):
si={}
for item in prefs[persion1]:
if item in prefs[persion2]:
si[item]=1
if len(si)==0:return 0
sum_of_squares=sum([pow(prefs[persion1][item]-prefs[persion2][item],2)
for item in prefs[persion1] if item in prefs[persion2]])
return 1/(1+sqrt(sum_of_squares))
def sim_pearson(prefs,p1,p2):
si={}
for item in prefs[p1]:
if item in prefs[p2]:
si[item]=1
n=len(si)
if n==0 : return 1
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
sum1sq=sum([pow(prefs[p1][it],2) for it in si])
sum2sq=sum([pow(prefs[p2][it],2) for it in si])
psum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
num=psum-(sum1*sum2/n)
den=sqrt((sum1sq-pow(sum1,2)/n)*(sum2sq-pow(sum2,2)/n))
if den==0:return 0
r=num/den
return r
測試程式碼:
from recommendations import critics
from distance import sim_pearson
from skimage.transform._geometric import SimilarityTransform
def topMatches(prefs,person,n=5,Similarity=sim_pearson):
scores=[(Similarity(prefs,person,other),other) for other in prefs if other!=person]
scores.sort();
scores.reverse();
return scores[0:n]
print(topMatches(critics,'toby',n=3))
實驗結果:
[(0.9912407071619299, 'lisa rose'), (0.9244734516419049, 'mick lasalle'), (0.8934051474415647, 'claudia puig')]
我們僅僅找到跟我們品味相同的人是不夠的,我們要得到對影片的評價,跟我們品味相同的人,我們就更加看重他的評價,所以我們把相關係數作為權值來計算一個影片的評價分數。
程式碼如下:
def getRecommendations(prefs,person,Similarity=sim_pearson):
totals={}
simSums={}
for other in prefs:
if other == person:continue
sim=Similarity(prefs,person,other)
if sim<0:continue
for item in prefs[other]:
if item not in prefs[person] or prefs[person][item]==0:
totals.setdefault(item,0)
totals[item]+=prefs[other][item]*sim
simSums.setdefault(item,0)
simSums[item]+=Similarity
rankings=[(total/simSums[item],item) for item,total in totals.items()]
rankings.sort()
rankings.reverse()
return rankings