使用牛頓法確定邏輯斯諦迴歸（Logistic Regression）最佳迴歸係數

阿新 • • 發佈：2019-01-11

邏輯斯諦迴歸

在邏輯斯諦迴歸中，因為使用梯度上升(gradient ascent)收斂較慢，固本文采用牛頓法(Newton’s Method)進行引數求解，試驗發現通常迭代10次左右就可達到收斂，而梯度上升法則需要迭代上百甚至上千次，當然實際的迭代次數也要視實際資料而定。

牛頓法

牛頓法與梯度下降法的功能一樣，都是最優化的常用方法。
對於一個函式，如果要求函式值為0時的值，如圖所示：

先隨機選一個點，然後求出該點的切線，即導數，延長切線與橫軸相交，以相交時的的值作為下一次迭代的值，更新規則如下

對於邏輯斯諦迴歸，需要求的是似然函式L(θ)的最大值，當L(θ)的導數L’(θ)為0時即為L(θ)的最大值，即求L’(θ)=0的引數，則可使用牛頓法進行求解，此時引數更新規則為

使用牛頓法的另一個好處是不需要像梯度法一樣指定學習率（即步長）。但是牛頓法需要對二階導（Hessian矩陣）進行求逆，不過隨著擬牛頓法（BFGS）以及限域擬牛頓法（LBFGS）的提出，大大減少了求逆的計算量，不過在本文還是使用牛頓法進行引數求解。

牛頓法求解邏輯斯諦迴歸引數

迭代中需要進行的主要步驟包括如下：

(1) 初始化引數θ

(2) 獲取資料x

(3) 對資料進行預測h

(4) 得到對數似然函式L(θ)

(5) 根據L(θ)計算梯度g

(6) 根據L(θ)計算Hessian矩陣H

(7) 更新引數θ

具體計算

(1) 初始化θ=(b, θ⁽¹⁾, θ

⁽²⁾, …, θ⁽ⁿ⁾)^T，初始時θ=(0, 0, 0, …, 0)^T

(2) 獲取x=(1, x⁽¹⁾, x⁽²⁾, …, x⁽ⁿ⁾)^T，其中x⁽¹⁾到x⁽ⁿ⁾為資料的n個特徵的值

(3) 使用邏輯斯諦函式計算預測值h

(4) 得到似然函式

對數似然函式為

(5) 根據L(θ)計算梯度，一個維度的梯度計算公式為

n+1維的梯度向量為

(6) 根據L(θ)計算Hessian矩陣H，其中j行k列的值為

(n+1)*(n+1)的Hessian矩陣為

(7) 更新引數θ，θ = θ – H^-1g

本文使用的迭代規則是求出m個樣本的梯度g以及Hessian矩陣H，對m個g和H求平均，然後更新θ

，虛擬碼：

while(iterNum)                       //迭代次數
{
<span style="white-space:pre">	</span>for i range(1,m)                 //m個樣本
<span style="white-space:pre">	</span>{

<span style="white-space:pre">		</span>(2) (3) (4) (5) (6)      //具體計算的步驟標號
	}
	g = (1/m)g
	H = (1/m)H
	θ = θ – H-1g
	iterNum--
}

測試

測試效果如下：

tie dai ma

依賴包括numpy庫和matplotlib庫。

程式碼比較亂，因為開始時並沒有使用numpy庫，所以向量的點積、相加等計算都是自己實現的。後來使用牛頓法需要求矩陣逆，自己偷懶不想寫，所以用了numpy。中間還有一些計算本來可以使用numpy中的方法直接進行計算，但可能是對矩陣的初始化方法不對，始終無法用numpy計算矩陣乘法，無奈只能自己用笨方法進行計算。程式碼較亂，做好瞎眼準備，ready! go!

# -*- coding: gb18030 -*-
__author__ = 'jinyu'
from numpy import *
import math
 
##
# 讀取訓練資料
# #
def loadDataSet(dataFileName):
    dataMat = []
    labelMat = []
    fr = open(dataFileName)
    for line in fr:
        lineArr = line.strip().split()
        dataMat.append([1.0, 10.0*float(lineArr[0]), 10.0*float(lineArr[1])])
        labelMat.append(int(lineArr[2]))
 
    return dataMat, labelMat
##
# sigmod函式
# #
def sigmoid(x):
    return 1.0 / (1+math.exp(-x))
 
##
# 梯度上升法
# #
def gradientAscent(dataMat, labelMat, alpha):
    m = len(dataMat)        #訓練集個數
    n = len(dataMat[0])     #資料特徵緯度
    theta = [0] * n
 
    iter = 1000
    while(iter):
        for i in range(m):
            hypothesis = sigmoid(computeDotProduct(dataMat[i], theta))
            error = labelMat[i] - hypothesis
            gradient = computeTimesVect(dataMat[i], error)
            theta = computeVectPlus(theta, computeTimesVect(gradient, alpha))
        iter -= 1
    return theta
 
##
# 牛頓法
# #
def newtonMethod(dataMat, labelMat, iterNum=10):
    m = len(dataMat)        #訓練集個數
    n = len(dataMat[0])     #資料特徵緯度
    theta = [0.0] * n
 
    while(iterNum):
        gradientSum = [0.0] * n
        hessianMatSum = [[0.0] * n]*n
        for i in range(m):
            try:
                hypothesis = sigmoid(computeDotProduct(dataMat[i], theta))
            except:
                continue
            error = labelMat[i] - hypothesis
            gradient = computeTimesVect(dataMat[i], error/m)
            gradientSum = computeVectPlus(gradientSum, gradient)
            hessian = computeHessianMatrix(dataMat[i], hypothesis/m)
            for j in range(n):
                hessianMatSum[j] = computeVectPlus(hessianMatSum[j], hessian[j])
 
        #計算hessian矩陣的逆矩陣有可能異常，如果捕獲異常則忽略此輪迭代
        try:
            hessianMatInv = mat(hessianMatSum).I.tolist()
        except:
            continue
        for k in range(n):
            theta[k] -= computeDotProduct(hessianMatInv[k], gradientSum)
 
        iterNum -= 1
    return theta
 
##
# 計算hessian矩陣
# #
def computeHessianMatrix(data, hypothesis):
    hessianMatrix = []
    n = len(data)
 
    for i in range(n):
        row = []
        for j in range(n):
            row.append(-data[i]*data[j]*(1-hypothesis)*hypothesis)
        hessianMatrix.append(row)
    return hessianMatrix
 
##
# 計算兩個向量的點積
# #
def computeDotProduct(a, b):
    if len(a) != len(b):
        return False
    n = len(a)
    dotProduct = 0
    for i in range(n):
        dotProduct += a[i] * b[i]
    return dotProduct
 
##
# 計算兩個向量的和
# #
def computeVectPlus(a, b):
    if len(a) != len(b):
        return False
    n = len(a)
    sum = []
    for i in range(n):
        sum.append(a[i]+b[i])
    return sum
 
##
# 計算某個向量的n倍
# #
def computeTimesVect(vect, n):
    nTimesVect = []
    for i in range(len(vect)):
        nTimesVect.append(n * vect[i])
    return nTimesVect
 
def plotBestFit(dataMat, labelMat, weights):
    import matplotlib.pyplot as plt
    dataArr = array(dataMat)
    n = shape(dataArr)[0]
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):
        if int(labelMat[i])== 1:
            xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green')
    x = arange(0.0, 70.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    ax.plot(x, y)
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()
 
def classifyVector(inX, weights):
    prob = sigmoid(sum(inX*weights))
    if prob > 0.5: return 1.0
    else: return 0.0
 
def colicTest(tainFileName, testFileName):
    frTrain = open(tainFileName); frTest = open(testFileName)
    trainingSet = []; trainingLabels = []
    for line in frTrain.readlines():
        currLine = line.strip().split('\t')
        lineArr =[]
        for i in range(21):
            lineArr.append(float(currLine[i]))
        trainingSet.append(lineArr)
        trainingLabels.append(float(currLine[21]))
    trainWeights = newtonMethod(trainingSet, trainingLabels, 10)
    errorCount = 0; numTestVec = 0.0
    for line in frTest.readlines():
        numTestVec += 1.0
        currLine = line.strip().split('\t')
        lineArr =[]
        for i in range(21):
            lineArr.append(float(currLine[i]))
        if int(classifyVector(array(lineArr), trainWeights))!= int(currLine[21]):
            errorCount += 1
    errorRate = (float(errorCount)/numTestVec)
    print "the error rate of this test is: %f" % errorRate
    return errorRate
 
dataMat, labelMat = loadDataSet("ex4x.dat")
theta = newtonMethod(dataMat, labelMat, 10)
print theta
plotBestFit(dataMat, labelMat, array(theta))
 
#colicTest('horseColicTraining.txt', 'horseColicTest.txt')

程式碼和資料

使用牛頓法確定邏輯斯諦迴歸（Logistic Regression）最佳迴歸係數

使用牛頓法確定邏輯斯諦迴歸（Logistic Regression）最佳迴歸係數

樸素貝葉斯法（naive bayes）邏輯迴歸（logistic regression）線性迴歸

廣義線性迴歸之邏輯斯諦迴歸（ Logistic Regression）

邏輯斯諦迴歸（Logistic regression）—《統計學習方法》

【Tensorflow】邏輯斯特迴歸（Logistic Regression）的簡單實現

機器學習專案實戰--邏輯迴歸（Logistic Regression）

機器學習演算法與Python實踐之邏輯迴歸（Logistic Regression）（二）

邏輯迴歸（Logistic Regression）演算法小結

機器學習/邏輯迴歸（logistic regression）/--附python程式碼

機器學習之邏輯迴歸（logistic regression）

Python手擼邏輯迴歸（logistic regression）

邏輯迴歸（Logistic+Regression）經典例項

邏輯迴歸（Logistic Regression）

邏輯迴歸（logistic regression）和線性迴歸（linear regression）

機器學習演算法與Python實踐之（七）邏輯迴歸（Logistic Regression）

機器學習筆記——logistic迴歸（logistic regression）

線性迴歸（logistic regression）

對數機率迴歸（Logistic Regression）總結

機器學習之邏輯迴歸（logistics regression）程式碼（牛頓法實現）

機器學習實戰（四）邏輯迴歸LR（Logistic Regression）

使用牛頓法確定邏輯斯諦迴歸（Logistic Regression）最佳迴歸係數

相關推薦