Mahout推薦演算法的實際應用（二）

阿新 • • 發佈：2019-01-06

為Wikipedia的連結關係做推薦

資料量：130,160,392 links from 5,706,070 articles, to 3,773,865 無評分值（連結關係僅表示相關所以可以使用LogLikelihoodSimilarity）

因為分散式推薦系統（map_reduce）執行速度一般較慢，一般並不適合線上推薦系統

實際實現：

基於Item_based的推薦演算法實際使用

101	102	103	104	105	106	107	U3	R
101	5	3	4	4	2	2	1	2.0	40.0
102	3	3	3	2	1	1	0	0.0	18.5
103	4	3	4	3	1	2	0	0.0	24.5
104	4	2	3	4	2	2	1	4.0	40.0
105	2	1	1	2	2	1	1	4.5	26.0
106	2	1	2	2	1	2	0	0.0	16.5
107	1	0	0	1	1	0	1	5.0	15.5

以上為利用Item相似度矩陣和U3對其中部分Item偏好計算的推薦結果

Mahout推薦演算法的Hadoop實現org.apache.mahout.cf.taste.hadoop.RecommenderJob

具體實現步驟

1. 生成使用者向量

1） Input files are treated as (Long,String) pairs by the framework, where the Long key is a position in the file and String value is the line of the text file. Example: 239 / 98955: 590 22 9059

2） Each line is parsed into user ID and several item IDs by a map function. The function emits new key-value pairs: user ID mapped to item ID, for each item ID. Example: 98955 / 590

3） The framework collects all item IDs that were mapped to each user ID together.

4） A reduce function constructors a Vector from all item IDs for the user, and outputs the user ID mapped to the user’s preference vector. All values in this vector are 0 or 1. Example: 98955 / [590:1.0, 22:1.0, 9059:1.0]

為每一個使用者保留一個相關的Item列表

2. 計算相似度矩陣

1） Input is user IDs mapped to Vectors of user preferences -- the output of the last MapReduce. Example: 98955 / [590:1.0,22:1.0,9059:1.0]

2） The map function determines all co-occurrences from one user’s preferences, and emits one pair of item IDs for each co-occurrence -- item ID mapped to item ID. Both mappings, from one item ID to the other and vice versa, are recorded. Example: 590 / 22

Map 儲存每個使用者向量內部全部相關的Item組

3） The framework collects, for each item, all co-occurrences mapped from that item.

4） The reducer counts, for each item ID, all co-occurrences that it receives and constructs a new Vector, which represents all co-occurrences for one item with count of number of times they have co-occurred. These can be used as the rows -- or columns -- of the co-occurrence matrix. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…]

生成相關度矩陣（從Item組中得到儲存權重）

3. 將1的向量與2的矩陣相乘得到推薦

for each row i in the co-occurrence matrix

compute dot product of row vector i with the user vector

assign dot product to ith element of R（正常使用的推薦演算法）

=》

由於相似度矩陣是沿對角先對稱的上門的演算法與下面的一致

assign R to be the zero vector

for each column i in the co-occurrence matrix

multiply column vector i by the ith element of the user vector

add this vector to R

實際計算過程：

101	102	103	104	105	106	107	U3	R
101	10	0	0	16	9	0	5	2.0	40.0
102	6	0	0	8	4.5	0	0	0.0	18.5
103	8	0	0	12	4.5	0	0	0.0	24.5
104	8	0	0	16	9	0	5	4.0	40.0
105	4	0	0	8	9	0	5	4.5	26.0
106	4	0	0	8	4.5	0	0	0.0	16.5
107	2	0	0	4	4.5	0	5	5.0	15.5

注意：

1、對應不在使用者向量內部的Item 使用者未作評價不會影響到最終的輸出結果

（上表中 102列 U3偏好值為0 在乘法中102列實在與0相乘不會影響最終結果）

由於Item數目遠多於User向量的維度（已表達偏好的Item）所以計算量將極大程度的簡化

2、使用的列向量是非常適合分散式儲存的且完全不相干

Mapper 1:

5） Input for mapper 1 is the co-occurrence matrix: item IDs as keys, mapped to columns as Vectors. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…]

6） The map function simply echoes its input, but with the Vector wrapped in a VectorOrPrefWritable.

Mapper 2:

1） Input for mapper 2 is again the user vectors: user IDs as keys, mapped to preference Vectors. Example: 98955 / [590:1.0,22:1.0,9059:1.0]

2） For each non-zero value in the user vector, the map function outputs item ID mapped to the user ID and preference value, wrapped in a VectorOrPrefWritable. Example: 590 / [98955:1.0]

3） The framework collects together, by item ID, the co-occurrence column and all user ID / preference value pairs.

每個專案的最後偏好值計算步驟

1） Input to the mapper is all co-occurrence column / user records. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…] and 590 / [98955:1.0]

2） Mapper outputs the co-occurrence column for each associated user times the preference value. Example: 590 / [22:3.0,95:1.0,…,9059:1.0,…]

3） The framework collects these partial products together, by user

4） The reducer unpacks this input and sums all the vectors, which gives the user’s final recommendation vector (call it R). Example: 590 / [22:4.0,45:3.0,95:11.0,…,9059:1.0,…]

此時的輸出排序後即可作為推薦結果

ReCommender在hadoop中執行結構圖

Mahout的Hadoop另一種使用方法：在多臺機器上運行同一個推薦引擎

（將資料複製到每一臺機器上（對資料量有限制），在每臺機器上針對使用者子集執行推薦演算法）

優點：不用對現有的已經實現的推薦演算法進行修改

侷限：資料量仍然有限，資料量必須限制在一臺機器的處理能力之內

用法舉例：bin/hadoop jar target/mahout-core-0.4-SNAPSHOT.job

org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob

-Dmapred.input.dir=input/ua.base

-Dmapred.output.dir=output

--recommenderClassName

org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender

Mahout推薦演算法的實際應用（二）

Mahout推薦演算法的實際應用（二）

[PlantSimulation]GAWizard遺傳演算法的應用（二）

java中的分散式應用（二）之各類中介軟體中用到的演算法

ELK部署logstash安裝部署及應用（二）

活動目錄的綜合應用（二）

shell腳本基礎應用（二）

git 在windows下的應用（二） - 遠程倉庫代碼管理

用ASP.NET Core MVC 和 EF Core 構建Web應用（二）

IO的應用（二）--序列化與反序列化

Revit二次開發高階應用（二）——怎樣在Revit中使用多執行緒

[演算法天天見]（二）進階排序

函式計算搭建 Serverless Web 應用（二）- 自定義域名

Spring Boot Actuator詳解與深入應用（二）：Actuator 2.x

axios在vue中的應用（二）—— 表單提交上傳圖片

牛客演算法進階（二）

2018年11月8日陣列及其應用（二）

Centos7.5配置DNS特殊解析應用（二）

LDU軟體工程演算法課程習題（二）

ArcGIS for Android 100.3的學習與應用（二）如何移除指定的點和線？

【Spring訊息】RabbitMq安裝及簡單應用（二）

Mahout推薦演算法的實際應用（二）

相關推薦