1. 程式人生 > >金融貸款逾期的模型構建2——整合模型

金融貸款逾期的模型構建2——整合模型

任務——模型構建

構建隨機森林、GBDT、XGBoost和LightGBM這4個模型,並對每一個模型進行評分,評分方式任意,例如準確度和auc值。

1、相關安裝資源

Tips:若 pip 安裝過程中,網速、超時等 ==》換源

sudo pip install -i http://pypi.douban.com/simple/ --trusted-host=pypi.douban.com/simple lightgbm

2、資料讀取 + 標準化

import pandas as pd
from sklearn.model_selection import train_test_split
import xgboost as xgb
import lightgbm as lgb
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import
GradientBoostingRegressor import warnings from sklearn.preprocessing import StandardScaler warnings.filterwarnings(action ='ignore', category = DeprecationWarning) ## 讀取資料 data = pd.read_csv("data_all.csv") x = data.drop(labels='status', axis=1) y = data['status'] x_train, x_test, y_train, y_test =
train_test_split(x, y,test_size=0.3,random_state=2018) print(len(x)) # 4754 ## 資料標準化 scaler = StandardScaler() scaler.fit(x_train) x_train_stand = scaler.transform(x_train) x_test_stand = scaler.transform(x_test)

3、 隨機森林模型

思想:通過 Bagging 的思想將多棵樹整合的一種演算法,它的基本單元是決策樹。

rfc = RandomForestClassifier()
rfc.fit(x_train, y_train)
rfc_score = rfc.score(x_test, y_test)
print("The score of RF:",rfc_score)

rfc1 = RandomForestClassifier()
rfc1.fit(x_train_stand, y_train)
rfc1_score = rfc1.score(x_test_stand, y_test)
print("The score of RF(with preprocessing):",rfc1_score)

輸出結果

The score of RF: 0.7638402242466713
The score of RF(with preprocessing): 0.7652417659425368

4、GBDT模型

GBDT 的全稱是 Gradient Boosting Decision Tree,梯度下降樹。
思想:通過損失函式的負梯度來擬合

gbdt = GradientBoostingRegressor()
gbdt.fit(x_train, y_train)
gbdt_score = gbdt.score(x_test, y_test)
print("The score of GBDT:",gbdt_score)

輸出結果:

The score of GBDT: 0.18118075405980671

5、XGBoost模型

xgb = xgb.XGBClassifier()
xgb.fit(x_train, y_train)
xgb_score = xgb.score(x_test, y_test)
print("The score of XGBoost:", xgb_score)

輸出結果

The score of XGBoost: 0.7855641205325858

遇到的問題

DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:

==》經過在網上查詢問題發現:這是一個numpy問題,在空陣列上棄用了真值檢查。該問題numpy已經修復。
==》解決方案1:忽略警告2

import warnings
warnings.filterwarnings(action ='ignore', category = DeprecationWarning)

6、lightGBM

思想:LightGBM 是一個梯度 boosting 框架,使用基於學習演算法的決策樹。它可以說是分散式的,高效的,有以下優勢:
更快的訓練效率 低記憶體使用 更高的準確率 支援並行化學習 可處理大規模資料

gbm = lgb.LGBMRegressor()
gbm.fit(x_train, y_train)
gbm_score = gbm.score(x_test, y_test)
print("The score of LightGBM:", gbdt_score)

輸出結果

The score of LightGBM: 0.18118075405980671