推薦系統學習05-libFM

阿新 • • 發佈：2019-02-14

介紹

分解機（FM）是一個通過特徵工程模擬大多數分解模型的通用方法。libFM是一個實現以隨機梯度下降stochastic gradient descent (SGD)和可選擇最小二乘alternating least squares (ALS) optimization以及使用蒙特卡洛的貝葉斯推理Bayesian inference using Markov Chain Monte Carlo (MCMC)為特徵的分解機的軟體。

原句：libFM is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares (ALS) optimization as well as Bayesian inference using Markov Chain Monte Carlo (MCMC).

檔案一覽：

編譯

進入libfm-1.42.src目錄，輸入 “make all”

bin目錄下有convert、transpose和libFM三個可執行檔案。

convert：Converting Recommender Files

libFM（這裡有很多引數，後面會說）

transpose：For MCMC and ALS learning, a transposed design matrix is used

引數說明

補充，剛剛上面顯示的libFM命令中有很多引數，分為強制性引數（執行時必須指定）和可選引數。

強制引數

即task要指定，到底是classification還是regression，分類還是迴歸。

train和test資料集要指定。

dim要指定，k0，k1，k2。

下面舉個例子： An FM for a regression task using bias, 1-way interactions and a factorization of k = 8 for pairwise interactions:

  ./libFM -task r -train ml1m-train.libfm -test ml1m-test.libfm -dim ’1,1,8’

可選引數，又分為基本引數和高階引數。

基本引數

out：將測試資料集的預測寫到指定的檔案。

rlog：一個關於統計生成的每次迭代的日誌檔案。

verbosity：verbosity引數1，可以讓libFM列印更多資訊。這對檢查資料是否正確尋找錯誤很有用。

高階引數

即可用meta選項給輸入變數分組。分組可以在 MCMC, SGDA和ALS用到，來定義更加複雜的正則化結構。

還有一個是cache選項，如果記憶體不夠，可以設定快取大小。

下面是具體Learning Methods（下面例子中我用的資料集都是ijcnn1的）

By default MCMC inference is used for learning because MCMC is the most easiest to handle (no learning rate, no regularization). In libFM you can choose from the following learning methods: SGD, ALS, MCMC and SGDA. For all learning methods, the number of iterations iter has to be speciﬁed.

操作

libFM的輸入資料支援兩種檔案格式：txt格式和二進位制格式。txt推薦新手使用。

資料格式跟SVMlite和 LIBSVM的一樣： Each row contains a training case (x,y) for the real-valued feature vector x with the target y. The row states ﬁrst the value y and then the non-zero values of x. For binary classiﬁcation, cases with y > 0 are regarded as the positive class and with y ≤ 0 as the negative class.

Example
4 0:1.5 3:-7.9

2 1:1e-5 3:2

-1 6:1 ...
這個檔案包含三個案例。第一列表明瞭這三個案例分別的目標，即4是第一個目標，2是第二個目標，-1是第三個目標。在目標之後，每行都包含了x的非零元素，0:1意味著 x0 = 1.5， 3:-7.9 意味著 x3 = −7.9,即 xINDEX = VALUE.

上面例子就是下面的圖：設計矩陣x和目標矩陣y

下面介紹一些操作

Converting Recommender Files

在推薦系統中，一個像 userid itemid rating 這樣的檔案格式經常被使用。一個把這樣資料集（或者更加複雜的像在上下文感知設定中）轉換libFM檔案格式的perl指令碼在scripts目錄中，使用：

 ./triple_format_to_libfm.pl -in ratings.dat -target 2 -delete_column 3 -separator "::"

輸出會被寫到一個加了.libfm字尾的檔案中，在上面例子中，輸出被寫到ratings.dat.libfm中。

如果一個單資料集包含多個檔案，比如，一個train和一個test分部，那麼轉換指令碼應該把兩個檔案都寫進去：

 ./triple_format_to_libfm.pl -in train.txt,test.txt -target 2 -separator "\t"

警告：如果你為每個檔案單獨執行轉換指令碼，變數（ids）會不匹配。比如：第一個檔案的第n個變數和第二個檔案的第n個變數不同。

其餘具體轉換自行閱讀專案中的readme。（轉換為.x,.y和.xt）
示例

要說明一下的是，現在網上找train和test資料集還是挺難的，因為現在MovieLens提供的資料集需要你手動分割。幸好還有一個網站上可以找到train和test資料集，https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/，也就是LibSVM format，這上面的資料集是libFM支援的。

我找的是binary型別的，ijcnn1資料集

Stochastic Gradient Descent (SGD)

example

./libFM -task r -train ml1m-train.libfm -test ml1m-test.libfm -dim ’1,1,8’ -iter 1000 -method sgd
-learn_rate 0.01 -regular ’0,0,0.01’ -init_stdev 0.1

Alternating Least Squares (ALS)

Example

./libFM -task r -train ml1m-train.libfm -test ml1m-test.libfm -dim ’1,1,8’ -iter 1000 -method als 
-regular ’0,0,10’ -init_stdev 0.1

Markov Chain Monte Carlo (MCMC)

Example

./libFM -task r -train ml1m-train.libfm -test ml1m-test.libfm -dim ’1,1,8’ -iter 1000 
-method mcmc -init_stdev 0.1

其餘事例可查閱專案中的readme。

推薦系統學習05-libFM

介紹

檔案一覽：

編譯

引數說明

強制引數

基本引數

高階引數

操作

Converting Recommender Files

Stochastic Gradient Descent (SGD)

Alternating Least Squares (ALS)

Markov Chain Monte Carlo (MCMC)

推薦系統學習05-libFM

推薦系統學習之評測指標

推薦系統學習的歷程(一)

推薦系統學習（一）——協同過濾

推薦系統學習筆記之三 LFM (Latent Factor Model) 隱因子模型 + SVD (singular value decomposition) 奇異值分解

推薦系統學習--基於item的協同過濾演算法及python實現

推薦系統學習——基於概率分析的方法

網易雲音樂推薦系統學習（1）

推薦系統學習筆記之四 Factorization Machines 因子分解機 + Field-aware Factorization Machine(FFM) 場感知分解機

推薦系統學習總結

【總結】推薦系統學習-LibMF

【python系統學習05】input函式——實現人機互動

隨時更新———個人喜歡的關於模式識別、機器學習、推薦系統、圖像特征、深度學習、數值計算、目標跟蹤等方面個人主頁及博客

機器學習和推薦系統中的評測指標—準確率(Precision)、召回率(Recall)、F值(F-Measure)簡介

吳恩達機器學習筆記 —— 17 推薦系統

基於任意深度學習+樹狀全庫搜索的新一代推薦系統

斯坦福NG機器學習聽課筆記-推薦系統（recommender system）

吳恩達機器學習 - 推薦系統吳恩達機器學習 - 推薦系統

機器學習-推薦系統中基於深度學習的混合協同過濾模型

學習筆記（十二）：推薦系統-隱語義模型

推薦系統學習05-libFM

介紹

檔案一覽：

編譯

引數說明

強制引數

基本引數

高階引數

操作

Converting Recommender Files

Stochastic Gradient Descent (SGD)

Alternating Least Squares (ALS)

Markov Chain Monte Carlo (MCMC)

相關推薦