金融貸款逾期的模型構建3——模型評估

阿新 • • 發佈：2018-12-24

文章目錄

一、評價指標

1、基本概念
2、準確率（accuracy）
3、精確率（precision）
4、召回率（recall）
5、F1值
6、roc曲線和 auc值

二、模型評估

1、Logistic Regression
2、SVM
3、決策樹
4、隨機森林
5、GBDT模型
6、XGBoost
7、lightGBM
8、繪圖

目標

：記錄7個模型（邏輯迴歸、SVM、決策樹、隨機森林、GBDT、XGBoost和LightGBM）關於accuracy、precision，recall和F1-score、auc值的評分表格，並畫出ROC曲線。

一、評價指標

1、基本概念

對於一個二分類問題，預測與真實結果會出現四種情況。

真實情況 \ 預測情況	正類	負類
正類	TP（True Positive）	FN（False Negative）
負類	FP（False Positive）	TN（True Negative）

我的記憶方法：首先看第一個字母是T則代表預測正確，反之F預測錯誤；然後看P表示預測的結果是正，N表示預測的結果為負。

2、準確率（accuracy）

accuracy表示所有預測正確的佔總的比重。
$a c c u r$

a c y = T P + T N T P + T N + F P + F N accuracy = \dfrac{TP + TN }{TP + TN+FP+FN}

a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

3、精確率（precision）

precision（查準率）：正確預測為正的佔全部預測為正的比例，也就是真正正確的佔所有預測為正的比例。
$precision = \dfrac{TP}{TP+FP}$

4、召回率（recall）

recall（查全率）：正確預測為正佔全部真實為正的比例，也就是真正正確的佔所有實際為正的比例。

例如：召回率在醫療方面非常重要。
$recall = \dfrac{TP}{TP+FN}$

5、F1值

F1值：精確率和召回率的調和均值，越大越好。
$\dfrac{2}{F_1} = \dfrac{1}{precision} + \dfrac{1}{recall}$
==》 $F_1 = \dfrac{2PR}{P + R} = \dfrac{2TP}{2TP+FP+FN}$

6、roc曲線和 auc值

roc曲線：接收者操作特徵曲線（receiver operating characteristic curve），是反映敏感性和特異性連續變數的綜合指標，ROC曲線上每個點反映著對同一訊號刺激的感受性。下圖是ROC曲線例子。
在這裡插入圖片描述

橫座標：1-Specificity，偽正類率(False positive rate，FPR，FPR=FP/(FP+TN))，預測為正但實際為負的樣本佔所有負例樣本的比例；

縱座標：Sensitivity，真正類率(True positive rate，TPR，TPR=TP/(TP+FN))，預測為正且實際為正的樣本佔所有正例樣本的比例。

真正的理想情況，TPR應接近1，FPR接近0，即圖中的（0,1）點。ROC曲線越靠攏（0,1）點，越偏離45度對角線越好。

AUC值。AUC (Area Under Curve) 被定義為ROC曲線下的面積。取值範圍 [0.5, 1]，AUC值越大的分類器，正確率越高。

二、模型評估

目標：考察 accuracy、precision，recall和f1-score、auc 的取值，並畫出roc曲線圖。

1、Logistic Regression

## Logistic Regression
lr = LogisticRegression()
lr.fit(x_train_stand, y_train)
y_pre_lr = lr.predict(x_test_stand)
y_score_lr = lr.predict_proba(x_test_stand)[:,1]
lr_accuracy = accuracy_score(y_test, y_pre_lr)
print('The accuracy of LR', lr_accuracy)
lr_precision = precision_score(y_test, y_pre_lr)
print('The precision of LR', lr_precision)
lr_recall = recall_score(y_test, y_pre_lr)
print('The recall of LR', lr_recall)
lr_f1_score = recall_score(y_test, y_pre_lr)
print('The F1 score of LR', lr_f1_score)
lr_roc_auc_score = roc_auc_score(y_test, y_pre_lr)
print('The AUC of LR', lr_roc_auc_score)
## roc 曲線
test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_lr)
plt.plot(test_fprs, test_tprs)
plt.plot([0,1], [0,1],"--")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.legend(labels=["Test AUC:"+str(round(lr_roc_auc_score,5))], loc="lower right")
plt.show()

輸出結果

The accuracy of LR 0.7876664330763841
The precision of LR 0.6609195402298851
The recall of LR 0.3203342618384401
The F1 score of LR 0.3203342618384401
The AUC of LR 0.6325454080727781

在這裡插入圖片描述

2、SVM

## SVM
svm = SVC(random_state=2018, probability=True)
svm.fit(x_train_stand, y_train)
y_pre_svm = svm.predict(x_test_stand)
y_score_svm = svm.predict_proba(x_test_stand)[:,1]
svm_accuracy = accuracy_score(y_test, y_pre_svm)
print('The accuracy of SVM', svm_accuracy)
svm_precision = precision_score(y_test, y_pre_svm)
print('The precision of SVM', svm_precision)
svm_recall = recall_score(y_test, y_pre_svm)
print('The recall of SVM', svm_recall)
svm_f1_score = recall_score(y_test, y_pre_svm)
print('The F1 score of SVM', svm_f1_score)
svm_roc_auc_score = roc_auc_score(y_test, y_pre_svm)
print('The AUC of SVM', svm_roc_auc_score)
## roc 曲線
test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_svm)
plt.plot(test_fprs, test_tprs)
plt.plot([0,1], [0,1],"--")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.legend(labels=["Test AUC:"+str(round(svm_roc_auc_score,5))], loc="lower right")
plt.show()

輸出結果

The accuracy of SVM 0.7806587245970568
The precision of SVM 0.7017543859649122
The recall of SVM 0.22284122562674094
The F1 score of SVM 0.22284122562674094
The AUC of SVM 0.5955030098171158

在這裡插入圖片描述

3、決策樹

## DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=2018)
dt.fit(x_train_stand, y_train)
y_pre_dt = svm.predict(x_test_stand)
dt_accuracy = accuracy_score(y_test, y_pre_dt)
print('The accuracy of DecisionTree', dt_accuracy)
dt_precision = precision_score(y_test, y_pre_dt)
print('The precision of DecisionTree', dt_precision)
dt_recall = recall_score(y_test, y_pre_dt)
print('The recall of DecisionTree', dt_recall)
dt_f1_score = recall_score(y_test, y_pre_dt)
print('The F1 score of DecisionTree', dt_f1_score)
dt_roc_auc_score = roc_auc_score(y_test, y_pre_dt)
print('The AUC of DecisionTree', dt_roc_auc_score)

輸出結果

The accuracy of DecisionTree 0.7806587245970568
The precision of DecisionTree 0.7017543859649122
The recall of DecisionTree 0.22284122562674094
The F1 score of DecisionTree 0.22284122562674094
The AUC of DecisionTree 0.5955030098171158

4、隨機森林

## 隨機森林模型
rfc = RandomForestClassifier()
rfc.fit(x_train_stand, y_train)
y_pre_rf = rfc.predict(x_test_stand)
rf_accuracy = accuracy_score(y_test, y_pre_rf)
print('The accuracy of Random Forest', rf_accuracy)
rf_precision = precision_score(y_test, y_pre_rf)
print('The precision of Random Forest', rf_precision)
rf_recall = recall_score(y_test, y_pre_rf)
print('The recall of Random Forest', rf_recall)
rf_f1_score = recall_score(y_test, y_pre_rf)
print('The F1 score of Random Forest', rf_f1_score)
rf_roc_auc_score = roc_auc_score(y_test, y_pre_rf)
print('The AUC of Random Forest', rf_roc_auc_score)

輸出結果

The accuracy of Random Forest 0.7638402242466713
The precision of Random Forest 0.5846153846153846
The recall of Random Forest 0.2116991643454039
The F1 score of Random Forest 0.2116991643454039
The AUC of Random Forest 0.5805686832962974

5、GBDT模型

## GBDT模型
gbdt = GradientBoostingClassifier()
gbdt.fit(x_train_stand, y_train)
y_pre_gbdt = gbdt.predict(x_test_stand)
gbdt_accuracy = accuracy_score(y_test, y_pre_gbdt)
print('The accuracy of GBDT', gbdt_accuracy)
gbdt_precision = precision_score(y_test, y_pre_gbdt)
print('The precision of GBDT', gbdt_precision)
gbdt_recall = recall_score(y_test, y_pre_gbdt)
print('The recall of GBDT', gbdt_recall)
gbdt_f1_score = recall_score(y_test, y_pre_gbdt)
print('The F1 score of GBDT', gbdt_f1_score)
gbdt_roc_auc_score = roc_auc_score(y_test, y_pre_gbdt)
print('The AUC of GBDT', gbdt_roc_auc_score)

輸出結果

The accuracy of GBDT 0.7792571829011913
The precision of GBDT 0.6057692307692307
The recall of GBDT 0.35097493036211697
The F1 score of GBDT 0.35097493036211697
The AUC of GBDT 0.6370979520724442

6、XGBoost

## XGBoost模型
xgb = xgb.XGBClassifier()
xgb.fit(x_train_stand, y_train)
y_pre_xgb = xgb.predict(x_test_stand)
xgb_accuracy = accuracy_score(y_test, y_pre_xgb)
print('The accuracy of XGBoost', xgb_accuracy)
xgb_precision = precision_score(y_test, y_pre_xgb)
print('The precision of XGBoost', xgb_precision)
xgb_recall = recall_score(y_test, y_pre_xgb)
print('The recall of XGBoost', xgb_recall)
xgb_f1_score = recall_score(y_test, y_pre_xgb)
print('The F1 score of XGBoost', xgb_f1_score)
xgb_roc_auc_score = roc_auc_score(y_test, y_pre_xgb)
print('The AUC of XGBoost', xgb_roc_auc_score)

輸出結果

The accuracy of XGBoost 0.7841625788367204
The precision of XGBoost 0.624390243902439
The recall of XGBoost 0.3565459610027855
The F1 score of XGBoost 0.3565459610027855
The AUC of XGBoost 0.642224291362816

7、lightGBM

## lightGBM
gbm = lgb.LGBMClassifier()
gbm.fit(x_train_stand, y_train)
y_pre_gbm = gbm.predict(x_test_stand)
gbm_accuracy = accuracy_score(y_test, y_pre_gbm)
print('The accuracy of lightGBM', gbm_accuracy)
gbm_precision = precision_score(y_test, y_pre_gbm)
print('The precision of lightGBM', gbm_precision)
gbm_recall = recall_score(y_test, y_pre_gbm)
print('The recall of lightGBM', gbm_recall)
gbm_f1_score = recall_score(y_test, y_pre_gbm)
print('The F1 score of lightGBM', gbm_f1_score)
gbm_roc_auc_score = roc_auc_score(y_test, y_pre_gbm)
print('The AUC of lightGBM', gbm_roc_auc_score)

輸出結果

The accuracy of lightGBM 0.7701471618780659
The precision of lightGBM 0.5688888888888889
The recall of lightGBM 0.3565459610027855
The F1 score of lightGBM 0.3565459610027855
The AUC of lightGBM 0.6328609954826662

8、繪圖

y_score_lr = lr.predict_proba(x_test_stand)[:,1]
y_score_svm = svm.predict_proba(x_test_stand)[:,1]
y_score_rf = rfc.predict_proba(x_test_stand)[:,1]
y_score_dt = dt.predict_proba(x_test_stand)[:,1]
y_score_gbdt = gbdt.predict_proba(x_test_stand)[:,1]
y_score_xgb = xgb.predict_proba(x_test_stand)[:,1]
y_score_gbm = gbm.predict_proba(x_test_stand)[:,1]
fpr_lr,tpr_lr,thresholds_lr = roc_curve(y_test,y_score_lr,pos_label=1)
fpr_svm,tpr_svm,thresholds_svm = roc_curve(y_test,y_score_svm,pos_label=1)
fpr_rf,tpr_rf,thresholds_rf = roc_curve(y_test,y_score_rf,pos_label=1)
fpr_dt,tpr_dt,thresholds_dt = roc_curve(y_test,y_score_dt,pos_label=1)
fpr_gbdt,tpr_gbdt,thresholds_gbdt = roc_curve(y_test,y_score_gbdt,pos_label=1)
fpr_xgb,tpr_xgb,thresholds_xgb = roc_curve(y_test,y_score_xgb,pos_label=1)
fpr_gbm,tpr_gbm,thresholds_gbm = roc_curve(y_test,y_score_gbm,pos_label=1)
## roc 曲線
plt.figure(figsize=[6,6])
plt.plot(fpr_lr,tpr_lr, color='black')
plt.plot(fpr_svm,tpr_svm, color='red')
plt.plot(fpr_rf,tpr_rf, color='green')
plt.plot(fpr_dt,tpr_dt, color='blue')
plt.plot(fpr_gbdt,tpr_gbdt, color='yellow')
plt.plot(fpr_xgb,tpr_xgb, color='brown')
plt.plot(fpr_gbm,tpr_gbm, color='purple')
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
label = [ "LR Test - AUC:"+ str(round(lr_roc_auc_score,5)),
          "SVM Test - AUC:"+ str(round(svm_roc_auc_score,5<

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    金融貸款逾期的模型構建3——模型評估
       
 
  
  
 
 
  文章目錄
  
   
    一、評價指標
    
     1、基本概念
     2、準確率（accuracy）
     3、精確率（precision）
     4、召回率（recall）
     5、F1值
     6、roc曲線 和 auc值
    
  

  
 

    

    
    金融貸款逾期的模型構建1
       
 
  
  
 資料 
 data_all.csv檔案是非原始資料，已經處理過了。資料是金融資料, 我們要做的是預測貸款使用者是否會逾期。表格中, status是標籤: 0表示未逾期, 1表示逾期。 
 任務——模型構建 
 給定資料集，資料三七分，隨機種子2018。（在任務1中什麼都不用考慮，即不需資 

  
 

    

    
    金融貸款逾期的模型構建4——模型調優
       
 
  
  
 
 
  文章目錄
  
   
    一、任務
    二、概述
    
     1、引數說明
     2、常用方法
    
    二、實現
    
     1、模組引入
     2、模型評估函式
     3、資料讀取
     4、Logistic Regress 

  
 

    

    
    金融貸款逾期的模型構建2——整合模型
       
 
  
  
 任務——模型構建 
 構建隨機森林、GBDT、XGBoost和LightGBM這4個模型，並對每一個模型進行評分，評分方式任意，例如準確度和auc值。 
 1、相關安裝資源 
  
  隨機森林、GBDT均在sklearn包中； 
  LightGBM：https://github.co 

  
 

    

    
    金融貸款逾期的模型構建7——模型融合
       
 
  
  
 
 
  文章目錄
  
   一、整合學習
   
    1、Bagging
    2、Boosting
    3、Stacking
    
     （1）核心圖解
     
      a、構建新的訓練集
      b、構建新的測試集
      c、最終的訓練與預測
 

  
 

    

    
    金融貸款逾期的模型構建6——特徵選擇
       
 
  
  
 
 
  文章目錄
  
   
    一、IV值
    
     1、概述
     2、IV計算
     
      （1）WOE
      （2）IV 計算
     
    
    二、實現
    
     0、相關模組
     1、IV值
     2、R 

  
 

    

    
    金融貸款逾期的模型構建5——資料預處理
       
 
  
  
 
 
  文章目錄
  
   
    一、相關庫
    二、資料讀取
    三、資料清洗——刪除無關、重複資料
    四、資料清洗——型別轉換
    
     1、資料集劃分
     2、缺失值處理
     3、異常值處理
     4、離散特徵編碼
     5、日期 

  
 

    

    
    ML - 貸款使用者逾期情況分析3 - 模型調優
       
 
  
  
 
 
  文章目錄
  
   模型調優 (判定貸款使用者是否逾期)
   
    1. 資料集劃分
    2. 模型評估
    3. LR模型
    4. SVM模型
    5. 決策樹模型
    6. XGBoost模型
    7. LightGBM模型
   
    

  
 

    

    
    金融貸款逾期的模型實現
      資料集的下載地址為 https://pan.baidu.com/s/1dtHJiV6zMbf_fWPi-dZ95g 
我們要做的是預測貸款使用者是否會逾期。表格中 "status" 是結果標籤：0表示未逾期，1表示逾期。 
先對資料進行三七分，隨機種子2018。這裡利用了LR,SVM,Decisio 

  
 

    

    
    客戶貸款逾期預測[7] - 模型融合
       
 
 任務 
          用你目前評分最高的模型作為基準模型，和其他模型進行stacking融合，得到最終模型及評分。   
 實現 
 #簡單調包實現
from mlxtend.classifier import StackingCVClassi 

  
 

    

    
    客戶貸款逾期預測[2]-svm和決策樹模型
       
 
 任務 
         本次以信用貸款資料作為練習資料，目的是學會使用常用的機器學習模型，用它們預測貸款客戶是否會逾期，給到的資料已經包含了標籤，列名是status，有0和1兩種值，0表示未逾期，1表示逾期，所以這是一個二分類的問題。 
 資料處理 
 &n 

  
 

    

    
    客戶貸款逾期預測[1]-邏輯迴歸模型
       
 
 任務 
       預測貸款客戶是否會逾期，status為響應變數，有0和1兩種值，0表示未逾期，1表示逾期。 
 程式碼： 
  
  # -*- coding: utf-8 -*- """ Created on Thu Nov 15 13:02:11 2018 
 

  
 

    

    
    ML - 貸款使用者逾期情況分析4 - 模型融合之Stacking
       
 
  
  
 
 
  文章目錄
  
   模型融合之Stacking (判定貸款使用者是否逾期)
   
    1. 理論介紹
    
     1.1 系統解釋
     1.2 詳細解釋
    
    2. 程式碼
    
     2.1 調包實現
     2.2 自己實現
   

  
 

    

    
    urdf構建機器人模型（3）
       
 
 在launch資料夾下新建一個launch檔案,輸入以下內容： 
 <launch>     <param name="robot_description" textfile="$(find robot_description)/urdf/robot_ 

  
 

    

    
    邏輯迴歸模型實踐-貸款逾期預測
      
                任務

      預測貸款使用者是否會逾期，status為響應變數，有0和1兩種值。

程式碼：


# -*- coding: utf-8 -*-
"""
Created on Thu Nov 15 13:02:11 2018

@author: keepi
"""

i 

  
 

    

    
    python工業網際網路應用實戰3—模型層構建
        本章開始我們正式進入到實戰專案開發過程，如何從需求分析獲得的實體資料轉到模型設計中來，變成Django專案中得模型層。當然，第一步還是在VS2019 IDE環境重建立一個工程專案，本文我們把工程名稱命名為IndDemo，如下圖：

 VS2019建立的Django專案結構如下圖

&n 

  
 

    

    
    【分析師】股票模型構建
      -1   模型   神經網絡   如何   name   建立   log   盈利   初步   采用神經網絡算法（神經網絡是要求最小的預測誤差，ok的），可以借鑒地震預測模型，每月或者一周更新一次數據，加入多個因子變量，盈利預測：兩三個月更新一次，每個月不更新的時候賦值為0，更新的時候加進去。先制作一個e 

  
 

    

    
    lvs-nat，lvs-dr模型構建phpMyAdmin
      lvs-nat   lvs-dr模型構建phpmyadmin     1.使用NAT模型的TCP協議類型的lvs服務負載均衡一個php應用，如Discuz!論壇或者phpMyAdmin；  NAT模型的lvs集群的構建是基於網絡地址轉換進行的；所以首先必須要有至少三臺主機，配置其IP地址；  ①  Direc 

  
 

    

    
    人工智能 tensorflow框架-->Softmax回歸模型的訓練與評估 09
      min   初始化   dict   ntop   ict   port   true   on()   run    
import tensorflow as tf import numpy as np 
#mnist數據輸入from tensorflow.examples.tutorials.mnist 

  
 

    

    
    《python機器學習—預測分析核心算法》：構建預測模型的一般流程
      定性   標識   貢獻   任務   表現   style   工程   重要   提取   參見原書1.5節
構建預測模型的一般流程
問題的日常語言表述->問題的數學語言重述重述問題、提取特征、訓練算法、評估算法
熟悉不同算法的輸入數據結構：1.提取或組合預測所需的特征2.設定訓練目標3.訓練模型4

金融貸款逾期的模型構建3——模型評估

文章目錄

一、評價指標

1、基本概念

2、準確率（accuracy）

3、精確率（precision）

4、召回率（recall）

5、F1值

6、roc曲線 和 auc值

二、模型評估

1、Logistic Regression

2、SVM

3、決策樹

4、隨機森林

5、GBDT模型

6、XGBoost

7、lightGBM

8、繪圖

相關推薦

6、roc曲線和 auc值