【機器學習】LFM（Latent Factor Model）

阿新 • • 發佈：2019-01-01

                            LFM（Latent Factor Model）

參考了[Key_Ky部落格](%28http://www.cnblogs.com/Key-Ky/p/3579363.html%29)的潛在矩陣分解的程式碼，實踐了一下。[圖及公式取自Harry Huang部落格](http://blog.csdn.net/harryhuang1990/article/details/9924377)

                                矩陣分解圖

這裡寫圖片描述

                            目標函式（最小平方誤差）：

這裡寫圖片描述

                            隨機梯度求解目標函式中的引數

梯度：

這裡寫圖片描述

import numpy as np
import matplotlib.pyplot as plt
import random
import math

class LFM:
    '''
    LFM 使用隨機梯度下降法，求解LFM引數
    '''
    data_address = ''
    datasets = []
    np_training_datasets =np.zeros(1)

    decompose_u = np.zeros(1 
)
    decompose_v = np.zeros(1)

    factor = 0
    size_training_datasets_x = 0
    size_training_datasets_y = 0

    alpha = 0.1
    iter_num = 20
    Lambda = 0.1
    epsilon = 0.01
    delta_error = []

    def __init__(self,data_address,factor,iter_num = 20,alpha = 0.1,Lambda = 0.1,epsilon = 0.01):
        '''
            @summary: 初始化引數
        ''' 

        self.data_address = data_address
        self.factor = factor
        self.alpha = alpha
        self.iter_num = iter_num
        self.Lambda = Lambda
        self.epsilon = epsilon


    def loadData(self):
        '''
        @summary: 載入原始資料
        '''
        input_file = open(self.data_address,'r')
        for line in input_file:
            tmp = line[:-1].split()
            self.datasets.append([int(i) for i in tmp])
        input_file.close()
        self.np_training_datasets = np.array(self.datasets)
    def initModel(self):
        '''
        @summary: 初始化U,V的潛在因子矩陣
        '''

        [x,y] = self.np_training_datasets.shape
        self.size_training_datasets_x = x
        self.size_training_datasets_y = y
        self.decompose_u = np.ones([x,self.factor])
        self.decompose_v = np.ones([self.factor,y])
    def fNormcalc(self,matrix):
        '''
        @summary: 計算矩陣的F範數，即所有元素的平方和再開方
        '''
        [x,y] = matrix.shape
        f_norm = 0
        for i in range(x):
            for j in range(y):
                f_norm += pow(matrix[i][j],2)
        f_norm = math.sqrt(f_norm)
        return f_norm

    #構建目標函式
    def c_error_cacl(self):
        '''
        @summary: 構建目標函式，即誤差平方和，以及加上正則化項，防止過擬合
        '''
        error_sum = 0
        for i in range(self.size_training_datasets_x):
            for j in range(self.size_training_datasets_y):
                if self.np_training_datasets[i][j] != 0 :
                    #即如果使用者i對商品j有評分
                    eui=0
                    for m in range(self.factor):
#                         eui += eui + self.decompose_u[i][m] * self.decompose_v[m][j]   #預測的評分！！！！！！！！！！
                        eui += self.decompose_u[i][m] * self.decompose_v[m][j]
                    error_sum += pow(self.np_training_datasets[i][j] - eui,2) + self.Lambda * pow(self.fNormcalc(self.decompose_u),2) + self.Lambda * pow(self.fNormcalc(self.decompose_v),2)
        return error_sum

    #隨機梯度下降法，迭代
    def iterator(self):
        for step in range(self.iter_num):
            old_error = 0.5 * self.c_error_cacl()   #目標函式1/2可以再梯度下不用乘以2，方便計算
            print 'old_error ',old_error
            for i in range(self.size_training_datasets_x):
                for j in range(self.size_training_datasets_y):

                    if self.np_training_datasets[i][j] != 0 :
                        for f in range(self.factor):
                            eui = 0
                            for m in range(self.factor):
                                eui = eui + self.decompose_u[i][m] * self.decompose_v[m][j]
                            self.decompose_u[i][f] += self.alpha * ((self.np_training_datasets[i][j] - eui) * self.decompose_v[f][j] - self.Lambda * self.decompose_u[i][f])
                            self.decompose_v[f][j] += self.alpha * ((self.np_training_datasets[i][j] - eui) * self.decompose_u[i][f] - self.Lambda * self.decompose_v[f][j])
            new_error = 0.5 * self.c_error_cacl()
            print 'new_error ',new_error
            if abs(new_error - old_error) < self.epsilon:
                break
            self.delta_error.append(abs(new_error - old_error)) #儲存每一次迭代的誤差

if __name__=='__main__':
#     randomdata('F://rating.txt')
    lfm=LFM(r'F:\rating1.txt',3,100)
    lfm.loadData()
    lfm.initModel()
    lfm.iterator()
    print lfm.decompose_u
    print lfm.decompose_v

    ex = range(len(lfm.delta_error))
    plt.figure(1)
    plt.plot(ex,lfm.delta_error)
    plt.show()

簡單的測試資料集：

0 1 2 0 0 4 0
0 0 0 5 0 6 0
0 0 0 0 0 0 0
0 0 0 0 9 0 0
0 0 0 0 0 0 0
10 0 9 8 0 0 0

分解的P矩陣：
[[ 0.36717743 1.07801356 0.83258288]
[ 1.22030144 1.19854111 1.16571532]
[ 1. 1. 1. ]
[ 2.05226677 1.46459895 1.31047188]
[ 1. 1. 1. ]
[ 2.75487483 1.50212454 1.32204588]]

分解的Q矩陣：
[[ 2.04842494 0.47546457 2.41670178 1.63844667 2.33548542 1.71443626 1. ]
[ 1.61164606 0.42750604 0.79754534 1.23739612 1.5793569 1.76219035 1. ]
[ 1.38761605 0.50476132 0.77759064 1.15922173 1.36918913 1.47754311 1. ]]

誤差函式曲線：
這裡寫圖片描述

【機器學習】LFM（Latent Factor Model）

LFM（Latent Factor Model）參考了[Key_Ky部落格](%28http://www.cnblogs.com/Key-Ky/p/3579363.html%29)的潛在矩陣分解的

【機器學習】LDA（線性判別分析）或fisher判別分析

內容目錄：一、LDA/fisher判別分析二、LDA判別分析與PCA對比一、fisher判別分析 1.首先在模式識別課程上學習的是fisher判別，LDA概念是看川大同學寫的500問接觸的，兩者是一樣的東西。 2推薦：深度學習500問 github連結形式是問答形式，初學者概念

【機器學習】LR（線性迴歸）—— python3 實現方案

import numpy as np class LR: def calcost(self, X, y, theta, lamb=1): ''' 平方誤差代價函式，使用L2正則化 :param X: 特徵集 m*n，m

【機器學習】吳（一）

什麼是機器學習?①Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives computers the abil

【機器學習】CNN（簡化模型）—— python3 實現方案

import numpy as np from scipy.io import loadmat class CNN: def __init__(self, layer1=2, learning_rate=0.1, iters=10000): sel

【機器學習】正確率（Precision）和召回率（Recall）

在二分類問題中，如果將一個正例判別為正例，那這就是一個真正例（True Positive， TP）；如果將一個反例判別為反例，那麼這就是一個真反例（True Negative，TN）；如果將

【機器學習】TensorFlow （二）優化器Optimizer

昨天整理了一下梯度下降演算法及其優化演算法，傳送門：https://blog.csdn.net/zxfhahaha/article/details/81385130 那麼在實戰中我們如何用到這些優化器，今天就整理一下TensorFlow中關於優化器Optimi

【機器學習】主成分分析PCA（Principal components analysis）

大小限制總結情況 pca 空間會有 ges nal 1. 問題真實的訓練數據總是存在各種各樣的問題：　　1、比如拿到一個汽車的樣本，裏面既有以“千米/每小時”度量的最大速度特征，也有“英裏/小時”的最大速度特征，

【機器學習】支持向量機（SVM）

cto nom 機器 ins 神經網絡學習參數 mage 36-6 感謝中國人民大學胡鶴老師，課程深入淺出，非常好關於SVM 可以做線性分類、非線性分類、線性回歸等，相比邏輯回歸、線性回歸、決策樹等模型（非神經網絡）功效最好傳統線性分類：選出兩堆數據的質心，並

【機器學習】谷歌的速成課程（一）

label spa dev 分類 ram 做出 org ron 表示問題構建 (Framing) 什麽是（監督式）機器學習？簡單來說，它的定義如下：機器學習系統通過學習如何組合輸入信息來對從未見過的數據做出有用的預測。標簽在簡單線性回歸中，標簽是我們要預測

【機器學習】簡單理解精確度（precision）和準確率（accuracy）的區別

不少人對分類指標中的Precision和Accuracy區分不開，在其他部落格中也有很多相關介紹，但總體不夠簡明易懂。筆者在查閱了若干資料後，總結如下： Precis

【機器學習】Windows +Anaconda3(python3.5)+opencv3.4.1 安裝（2）

Windows +Anaconda3(python3.5)+opencv3.4.1 安裝（2）原文參考：https://www.cnblogs.com/

【機器學習】Windows +Anaconda3(python3.5)+opencv3.4.1 安裝（1）

Windows +Anaconda3(python3.5)+opencv3.4.1 安裝（1） 1. Anacond的介紹 Anaconda指的是一個

【機器學習】Windows +Anaconda3(python3.5)+opencv3.4.1 安裝（4）

Windows +Anaconda3(python3.5)+opencv3.4.1 安裝(4) 想解決import cv2問題，於是在網上找了一些方法，但是許多是不可行的，後來發現一

【機器學習】softmax迴歸（二）

通過上篇softmax迴歸已經知道大概了，但是有個缺點，現在來仔細看看 Softmax迴歸模型引數化的特點 Softmax 迴歸有一個不尋常的特點：它有一個“冗餘”的引數集。為了便於闡述這一特點，假設我們從引數向量中減去了向量，這時，每一個

【機器學習】softmax迴歸（一）

在 softmax迴歸中，我們解決的是多分類問題（相對於 logistic 迴歸解決的二分類問題），類標可以取個不同的值（而不是 2 個）。因此，對於訓練集，我們有。（注意此處的類別下標從 1 開始，而不是 0）。例如，在 M

【機器學習】決策樹與隨機森林（轉）

文章轉自： https://www.cnblogs.com/fionacai/p/5894142.html 首先，在瞭解樹模型之前，自然想到樹模型和線性模型有什麼區別呢？其中最重要的是，樹形模型是一個一個特徵進行處理，之前線性模型是所有特徵給予權重相加得到一個新的值。決

【機器學習】Apriori演算法——原理及程式碼實現（Python版）

Apriopri演算法 Apriori演算法在資料探勘中應用較為廣泛，常用來挖掘屬性與結果之間的相關程度。對於這種尋找資料內部關聯關係的做法，我們稱之為：關聯分析或者關聯規則學習。而Apriori演算法就是其中非常著名的演算法之一。關聯分析，主要是通過演算法在大規模資料集中尋找頻繁項集和關聯規則。

【機器學習】使用Python中的區域性敏感雜湊（LSH）構建推薦引擎

學習如何使用LSH在Python中構建推薦引擎; 一種可以處理數十億行的演算法你會學到：在本教程結束時，讀者可以學習如何：通過建立帶狀皰疹來檢查和準備LSH的資料選擇LSH的引數為LSH建立Minhash 使用LSH Query推薦會議論文使用LSH

【機器學習】使用Python的自然語言工具包（NLTK）對Reddit新聞標題進行情感分析

讓我們使用Reddit API獲取新聞標題並執行情感分析在我上一篇文章中，使用Python進行K-Means聚類，我們只是抓取了一些預編譯資料，但是對於這篇文章，我想更深入地瞭解一些實時資料。使用Reddit API，我們可以從各種新聞subreddit獲得成千上萬的

【機器學習】LFM（Latent Factor Model）

相關推薦