1. 程式人生 > >金融貸款逾期的模型構建3——模型評估

金融貸款逾期的模型構建3——模型評估

文章目錄


目標
:記錄7個模型(邏輯迴歸、SVM、決策樹、隨機森林、GBDT、XGBoost和LightGBM)關於accuracy、precision,recall和F1-score、auc值的評分表格,並畫出ROC曲線。

一、評價指標

1、基本概念

對於一個二分類問題,預測與真實結果會出現四種情況。

真實情況 \ 預測情況 正類 負類
正類 TP(True Positive) FN(False Negative)
負類 FP(False Positive) TN(True Negative)

我的記憶方法:首先看第一個字母是T則代表預測正確,反之F預測錯誤;然後看P表示預測的結果是正,N表示預測的結果為負。

2、準確率(accuracy)

accuracy表示所有預測正確的佔總的比重。
a c c u r

a c y = T P + T N T P + T N + F P + F N accuracy = \dfrac{TP + TN }{TP + TN+FP+FN}

3、精確率(precision)

precision(查準率):正確預測為正的佔全部預測為正的比例,也就是真正正確的佔所有預測為正的比例。
p r e c i s i o n = T P T P + F P precision = \dfrac{TP}{TP+FP}

4、召回率(recall)

recall(查全率):正確預測為正佔全部真實為正的比例,也就是真正正確的佔所有實際為正的比例。

例如:召回率在醫療方面非常重要。
r e c a l l = T P T P + F N recall = \dfrac{TP}{TP+FN}

5、F1值

F1值:精確率和召回率的調和均值,越大越好。
2 F 1 = 1 p r e c i s i o n + 1 r e c a l l \dfrac{2}{F_1} = \dfrac{1}{precision} + \dfrac{1}{recall}
==》 F 1 = 2 P R P + R = 2 T P 2 T P + F P + F N F_1 = \dfrac{2PR}{P + R} = \dfrac{2TP}{2TP+FP+FN}

6、roc曲線 和 auc值

roc曲線:接收者操作特徵曲線(receiver operating characteristic curve),是反映敏感性和特異性連續變數的綜合指標,ROC曲線上每個點反映著對同一訊號刺激的感受性。下圖是ROC曲線例子。
在這裡插入圖片描述

橫座標:1-Specificity,偽正類率(False positive rate,FPR,FPR=FP/(FP+TN)),預測為正但實際為負的樣本佔所有負例樣本的比例;

縱座標:Sensitivity,真正類率(True positive rate,TPR,TPR=TP/(TP+FN)),預測為正且實際為正的樣本佔所有正例樣本的比例。

真正的理想情況,TPR應接近1,FPR接近0,即圖中的(0,1)點。ROC曲線越靠攏(0,1)點,越偏離45度對角線越好。

AUC值。AUC (Area Under Curve) 被定義為ROC曲線下的面積。取值範圍 [0.5, 1],AUC值越大的分類器,正確率越高。

二、模型評估

目標:考察 accuracy、precision,recall和f1-score、auc 的取值,並畫出roc曲線圖。

1、Logistic Regression

## Logistic Regression
lr = LogisticRegression()
lr.fit(x_train_stand, y_train)
y_pre_lr = lr.predict(x_test_stand)
y_score_lr = lr.predict_proba(x_test_stand)[:,1]
lr_accuracy = accuracy_score(y_test, y_pre_lr)
print('The accuracy of LR', lr_accuracy)
lr_precision = precision_score(y_test, y_pre_lr)
print('The precision of LR', lr_precision)
lr_recall = recall_score(y_test, y_pre_lr)
print('The recall of LR', lr_recall)
lr_f1_score = recall_score(y_test, y_pre_lr)
print('The F1 score of LR', lr_f1_score)
lr_roc_auc_score = roc_auc_score(y_test, y_pre_lr)
print('The AUC of LR', lr_roc_auc_score)
## roc 曲線
test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_lr)
plt.plot(test_fprs, test_tprs)
plt.plot([0,1], [0,1],"--")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.legend(labels=["Test AUC:"+str(round(lr_roc_auc_score,5))], loc="lower right")
plt.show()

輸出結果

The accuracy of LR 0.7876664330763841
The precision of LR 0.6609195402298851
The recall of LR 0.3203342618384401
The F1 score of LR 0.3203342618384401
The AUC of LR 0.6325454080727781

在這裡插入圖片描述

2、SVM

## SVM
svm = SVC(random_state=2018, probability=True)
svm.fit(x_train_stand, y_train)
y_pre_svm = svm.predict(x_test_stand)
y_score_svm = svm.predict_proba(x_test_stand)[:,1]
svm_accuracy = accuracy_score(y_test, y_pre_svm)
print('The accuracy of SVM', svm_accuracy)
svm_precision = precision_score(y_test, y_pre_svm)
print('The precision of SVM', svm_precision)
svm_recall = recall_score(y_test, y_pre_svm)
print('The recall of SVM', svm_recall)
svm_f1_score = recall_score(y_test, y_pre_svm)
print('The F1 score of SVM', svm_f1_score)
svm_roc_auc_score = roc_auc_score(y_test, y_pre_svm)
print('The AUC of SVM', svm_roc_auc_score)
## roc 曲線
test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_svm)
plt.plot(test_fprs, test_tprs)
plt.plot([0,1], [0,1],"--")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.legend(labels=["Test AUC:"+str(round(svm_roc_auc_score,5))], loc="lower right")
plt.show()

輸出結果

The accuracy of SVM 0.7806587245970568
The precision of SVM 0.7017543859649122
The recall of SVM 0.22284122562674094
The F1 score of SVM 0.22284122562674094
The AUC of SVM 0.5955030098171158

在這裡插入圖片描述

3、決策樹

## DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=2018)
dt.fit(x_train_stand, y_train)
y_pre_dt = svm.predict(x_test_stand)
dt_accuracy = accuracy_score(y_test, y_pre_dt)
print('The accuracy of DecisionTree', dt_accuracy)
dt_precision = precision_score(y_test, y_pre_dt)
print('The precision of DecisionTree', dt_precision)
dt_recall = recall_score(y_test, y_pre_dt)
print('The recall of DecisionTree', dt_recall)
dt_f1_score = recall_score(y_test, y_pre_dt)
print('The F1 score of DecisionTree', dt_f1_score)
dt_roc_auc_score = roc_auc_score(y_test, y_pre_dt)
print('The AUC of DecisionTree', dt_roc_auc_score)

輸出結果

The accuracy of DecisionTree 0.7806587245970568
The precision of DecisionTree 0.7017543859649122
The recall of DecisionTree 0.22284122562674094
The F1 score of DecisionTree 0.22284122562674094
The AUC of DecisionTree 0.5955030098171158

4、隨機森林

## 隨機森林模型
rfc = RandomForestClassifier()
rfc.fit(x_train_stand, y_train)
y_pre_rf = rfc.predict(x_test_stand)
rf_accuracy = accuracy_score(y_test, y_pre_rf)
print('The accuracy of Random Forest', rf_accuracy)
rf_precision = precision_score(y_test, y_pre_rf)
print('The precision of Random Forest', rf_precision)
rf_recall = recall_score(y_test, y_pre_rf)
print('The recall of Random Forest', rf_recall)
rf_f1_score = recall_score(y_test, y_pre_rf)
print('The F1 score of Random Forest', rf_f1_score)
rf_roc_auc_score = roc_auc_score(y_test, y_pre_rf)
print('The AUC of Random Forest', rf_roc_auc_score)

輸出結果

The accuracy of Random Forest 0.7638402242466713
The precision of Random Forest 0.5846153846153846
The recall of Random Forest 0.2116991643454039
The F1 score of Random Forest 0.2116991643454039
The AUC of Random Forest 0.5805686832962974

5、GBDT模型

## GBDT模型
gbdt = GradientBoostingClassifier()
gbdt.fit(x_train_stand, y_train)
y_pre_gbdt = gbdt.predict(x_test_stand)
gbdt_accuracy = accuracy_score(y_test, y_pre_gbdt)
print('The accuracy of GBDT', gbdt_accuracy)
gbdt_precision = precision_score(y_test, y_pre_gbdt)
print('The precision of GBDT', gbdt_precision)
gbdt_recall = recall_score(y_test, y_pre_gbdt)
print('The recall of GBDT', gbdt_recall)
gbdt_f1_score = recall_score(y_test, y_pre_gbdt)
print('The F1 score of GBDT', gbdt_f1_score)
gbdt_roc_auc_score = roc_auc_score(y_test, y_pre_gbdt)
print('The AUC of GBDT', gbdt_roc_auc_score)

輸出結果

The accuracy of GBDT 0.7792571829011913
The precision of GBDT 0.6057692307692307
The recall of GBDT 0.35097493036211697
The F1 score of GBDT 0.35097493036211697
The AUC of GBDT 0.6370979520724442

6、XGBoost

## XGBoost模型
xgb = xgb.XGBClassifier()
xgb.fit(x_train_stand, y_train)
y_pre_xgb = xgb.predict(x_test_stand)
xgb_accuracy = accuracy_score(y_test, y_pre_xgb)
print('The accuracy of XGBoost', xgb_accuracy)
xgb_precision = precision_score(y_test, y_pre_xgb)
print('The precision of XGBoost', xgb_precision)
xgb_recall = recall_score(y_test, y_pre_xgb)
print('The recall of XGBoost', xgb_recall)
xgb_f1_score = recall_score(y_test, y_pre_xgb)
print('The F1 score of XGBoost', xgb_f1_score)
xgb_roc_auc_score = roc_auc_score(y_test, y_pre_xgb)
print('The AUC of XGBoost', xgb_roc_auc_score)

輸出結果

The accuracy of XGBoost 0.7841625788367204
The precision of XGBoost 0.624390243902439
The recall of XGBoost 0.3565459610027855
The F1 score of XGBoost 0.3565459610027855
The AUC of XGBoost 0.642224291362816

7、lightGBM

## lightGBM
gbm = lgb.LGBMClassifier()
gbm.fit(x_train_stand, y_train)
y_pre_gbm = gbm.predict(x_test_stand)
gbm_accuracy = accuracy_score(y_test, y_pre_gbm)
print('The accuracy of lightGBM', gbm_accuracy)
gbm_precision = precision_score(y_test, y_pre_gbm)
print('The precision of lightGBM', gbm_precision)
gbm_recall = recall_score(y_test, y_pre_gbm)
print('The recall of lightGBM', gbm_recall)
gbm_f1_score = recall_score(y_test, y_pre_gbm)
print('The F1 score of lightGBM', gbm_f1_score)
gbm_roc_auc_score = roc_auc_score(y_test, y_pre_gbm)
print('The AUC of lightGBM', gbm_roc_auc_score)

輸出結果

The accuracy of lightGBM 0.7701471618780659
The precision of lightGBM 0.5688888888888889
The recall of lightGBM 0.3565459610027855
The F1 score of lightGBM 0.3565459610027855
The AUC of lightGBM 0.6328609954826662

8、繪圖

y_score_lr = lr.predict_proba(x_test_stand)[:,1]
y_score_svm = svm.predict_proba(x_test_stand)[:,1]
y_score_rf = rfc.predict_proba(x_test_stand)[:,1]
y_score_dt = dt.predict_proba(x_test_stand)[:,1]
y_score_gbdt = gbdt.predict_proba(x_test_stand)[:,1]
y_score_xgb = xgb.predict_proba(x_test_stand)[:,1]
y_score_gbm = gbm.predict_proba(x_test_stand)[:,1]
fpr_lr,tpr_lr,thresholds_lr = roc_curve(y_test,y_score_lr,pos_label=1)
fpr_svm,tpr_svm,thresholds_svm = roc_curve(y_test,y_score_svm,pos_label=1)
fpr_rf,tpr_rf,thresholds_rf = roc_curve(y_test,y_score_rf,pos_label=1)
fpr_dt,tpr_dt,thresholds_dt = roc_curve(y_test,y_score_dt,pos_label=1)
fpr_gbdt,tpr_gbdt,thresholds_gbdt = roc_curve(y_test,y_score_gbdt,pos_label=1)
fpr_xgb,tpr_xgb,thresholds_xgb = roc_curve(y_test,y_score_xgb,pos_label=1)
fpr_gbm,tpr_gbm,thresholds_gbm = roc_curve(y_test,y_score_gbm,pos_label=1)
## roc 曲線
plt.figure(figsize=[6,6])
plt.plot(fpr_lr,tpr_lr, color='black')
plt.plot(fpr_svm,tpr_svm, color='red')
plt.plot(fpr_rf,tpr_rf, color='green')
plt.plot(fpr_dt,tpr_dt, color='blue')
plt.plot(fpr_gbdt,tpr_gbdt, color='yellow')
plt.plot(fpr_xgb,tpr_xgb, color='brown')
plt.plot(fpr_gbm,tpr_gbm, color='purple')
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
label = [ "LR Test - AUC:"+ str(round(lr_roc_auc_score,5)),
          "SVM Test - AUC:"+ str(round(svm_roc_auc_score,5<