1. 程式人生 > >Mahout推薦演算法的實際應用(二)

Mahout推薦演算法的實際應用(二)

Wikipedia的連結關係做推薦

資料量:130,160,392 links from 5,706,070 articles, to 3,773,865 無評分值(連結關係僅表示相關所以可以使用LogLikelihoodSimilarity

因為分散式推薦系統(map_reduce)執行速度一般較慢,一般並不適合線上推薦系統

實際實現:

基於Item_based的推薦演算法實際使用

101

102

103

104

105

106

107

U3

R

101

5

3

4

4

2

2

1

2.0

40.0

102

3

3

3

2

1

1

0

0.0

18.5

103

4

3

4

3

1

2

0

0.0

24.5

104

4

2

3

4

2

2

1

4.0

40.0

105

2

1

1

2

2

1

1

4.5

26.0

106

2

1

2

2

1

2

0

0.0

16.5

107

1

0

0

1

1

0

1

5.0

15.5

以上為利用Item相似度矩陣和U3對其中部分Item偏好計算的推薦結果

Mahout推薦演算法的Hadoop實現org.apache.mahout.cf.taste.hadoop.RecommenderJob

具體實現步驟

1.     生成使用者向量

1)  Input files are treated as (Long,String) pairs by the framework, where the Long key is a position in the file and String value is the line of the text file. Example: 239 / 98955: 590 22 9059

2)  Each line is parsed into user ID and several item IDs by a map function. The function emits new key-value pairs: user ID mapped to item ID, for each item ID. Example: 98955 / 590

3)  The framework collects all item IDs that were mapped to each user ID together.

4)  A reduce function constructors a Vector from all item IDs for the user, and outputs the user ID mapped to the user’s preference vector. All values in this vector are 0 or 1. Example: 98955 / [590:1.0, 22:1.0, 9059:1.0]

為每一個使用者保留一個相關的Item列表

2.       計算相似度矩陣

1)  Input is user IDs mapped to Vectors of user preferences -- the output of the last MapReduce. Example: 98955 / [590:1.0,22:1.0,9059:1.0]

2)  The map function determines all co-occurrences from one user’s preferences, and emits one pair of item IDs for each co-occurrence -- item ID mapped to item ID. Both mappings, from one item ID to the other and vice versa, are recorded. Example: 590 / 22

Map 儲存每個使用者向量內部全部相關的Item

3)  The framework collects, for each item, all co-occurrences mapped from that item.

4)  The reducer counts, for each item ID, all co-occurrences that it receives and constructs a new Vector, which represents all co-occurrences for one item with count of number of times they have co-occurred. These can be used as the rows -- or columns -- of the co-occurrence matrix. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…]

生成相關度矩陣(從Item組中得到儲存權重)

3.       1的向量與2的矩陣相乘得到推薦

    for each row i in the co-occurrence matrix

compute dot product of row vector i with the user vector

assign dot product to ith element of R(正常使用的推薦演算法)

=

由於相似度矩陣是沿對角先對稱的上門的演算法與下面的一致

assign R to be the zero vector

for each column i in the co-occurrence matrix

multiply column vector i by the ith element of the user vector

add this vector to R

實際計算過程:

101

102

103

104

105

106

107

U3

R

101

10

0

0

16

9

0

5

2.0

40.0

102

6

0

0

8

4.5

0

0

0.0

18.5

103

8

0

0

12

4.5

0

0

0.0

24.5

104

8

0

0

16

9

0

5

4.0

40.0

105

4

0

0

8

9

0

5

4.5

26.0

106

4

0

0

8

4.5

0

0

0.0

16.5

107

2

0

0

4

4.5

0

5

5.0

15.5

注意:

1、對應不在使用者向量內部的Item 使用者未作評價不會影響到最終的輸出結果

(上表中 102 U3偏好值為在乘法中102列實在與0相乘不會影響最終結果)

由於Item數目遠多於User向量的維度(已表達偏好的Item)所以計算量將極大程度的簡化

2、使用的列向量是非常適合分散式儲存的且完全不相干

Mapper 1:

5)  Input for mapper 1 is the co-occurrence matrix: item IDs as keys, mapped to columns as Vectors. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…]

6)  The map function simply echoes its input, but with the Vector wrapped in a VectorOrPrefWritable.

Mapper 2:

1)  Input for mapper 2 is again the user vectors: user IDs as keys, mapped to preference Vectors. Example: 98955 / [590:1.0,22:1.0,9059:1.0]

2)  For each non-zero value in the user vector, the map function outputs item ID mapped to the user ID and preference value, wrapped in a VectorOrPrefWritable. Example: 590 / [98955:1.0]

3)  The framework collects together, by item ID, the co-occurrence column and all user ID / preference value pairs.

每個專案的最後偏好值計算步驟

1)  Input to the mapper is all co-occurrence column / user records. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…] and 590 / [98955:1.0]

2)  Mapper outputs the co-occurrence column for each associated user times the preference value. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…]

3)  The framework collects these partial products together, by user

4)  The reducer unpacks this input and sums all the vectors, which gives the user’s final recommendation vector (call it R). Example: 590 / [22:4.0,45:3.0,95:11.0,…,9059:1.0,…]

此時的輸出排序後即可作為推薦結果

ReCommenderhadoop中執行結構圖

 

MahoutHadoop另一種使用方法:在多臺機器上運行同一個推薦引擎

(將資料複製到每一臺機器上(對資料量有限制),在每臺機器上針對使用者子集執行推薦演算法)

優點:不用對現有的已經實現的推薦演算法進行修改

侷限:資料量仍然有限,資料量必須限制在一臺機器的處理能力之內

用法舉例:bin/hadoop jar target/mahout-core-0.4-SNAPSHOT.job

org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob

-Dmapred.input.dir=input/ua.base

-Dmapred.output.dir=output

--recommenderClassName

org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender