1. 程式人生 > >工程能力UP | LightGBM的調參乾貨教程與並行優化

工程能力UP | LightGBM的調參乾貨教程與並行優化

這是個人在競賽中對LGB模型進行調參的詳細過程記錄,主要包含下面六個步驟: 1. 大學習率,確定估計器引數```n_estimators/num_iterations/num_round/num_boost_round```; 2. 確定```num_leaves```和```max_depth``` 3. 確定```min_data_in_leaf``` 4. 確定```bagging_fraction+bagging_freq```和```feature_fraction``` 5. 確定L1L2正則```reg_alpha```和```reg_lambda```; 6. 降低學習率 【這裡必須說一下,lightbg的引數的同義詞實在太多了,很多不同的引數表示的是同一個意思,不過本文中使用“/”分開】 # 0 並行優化 主要有兩種feature parallel特徵並行和data parallel資料並行。具體的過程我不也不瞭解,因為我沒有多個CPU給我耍(窮)。 - feature parallel:每個worker有全部的訓練資料,但是他們只用部分特徵進行訓練,然後不同worker之間交流他們的區域性最優特徵和分裂點,比較出來哪一個是全域性最優的。 - data parallel: 每一個worker有部分的樣本,然後繪製區域性特徵直方圖。彼此交流之後,得到全域性直方圖進行訓練。 【雖然具體的機制不太瞭解,但是最重要的是:**小資料用feature parallel,大資料用data parallel**】 # 1. 估計器數量 不管怎麼樣,我們先把學習率先定一個較高的值,這裡取 ```learning_rate = 0.1```,其次確定估計器```boosting/boost/boosting_type```的型別,不過預設都會選```gbdt```。 **這裡可以體現,雖然LGB和XGB經常拿來和GBDT比較,但是其本質都還是GBDT的boost思想** 為了確定估計器的數目,也就是boosting迭代的次數,也可以說是殘差樹的數目,引數名為```n_estimators/num_iterations/num_round/num_boost_round```。我們可以先將該引數設成一個較大的數,然後在cv結果中檢視最優的迭代次數,具體如程式碼。 在這之前,我們必須給其他重要的引數一個初始值。初始值的意義不大,只是為了方便確定其他引數。下面先給定一下初始值: 以下引數根據具體專案要求定: ```python 'boosting_type'/'boosting': 'gbdt' 'objective': 'regression' 'metric': 'rmse' ``` 以下引數我選擇的初始值,你可以根據自己的情況來選擇: ```python 'max_depth': 6 ### 根據問題來定咯,由於我的資料集不是很大,所以選擇了一個適中的值,其實4-10都無所謂。 'num_leaves': 50 ### 由於lightGBM是leaves_wise生長,官方說法是要小於2^max_depth 'subsample'/'bagging_fraction':0.8 ### 資料取樣 'colsample_bytree'/'feature_fraction': 0.8 ### 特徵取樣 ``` 下面我是用LightGBM的cv函式進行演示: ```python params = { 'boosting_type': 'gbdt', 'objective': 'regression', 'learning_rate': 0.1, 'num_leaves': 50, 'max_depth': 6, 'subsample': 0.8, 'colsample_bytree': 0.8, } ``` ```python data_train = lgb.Dataset(df_train, y_train, silent=True) cv_results = lgb.cv( params, data_train, num_boost_round=1000, nfold=5, stratified=False, shuffle=True, metrics='rmse', early_stopping_rounds=50, verbose_eval=50, show_stdv=True, seed=0) print('best n_estimators:', len(cv_results['rmse-mean'])) print('best cv score:', cv_results['rmse-mean'][-1]) ``` 執行結果是: ```python [50] cv_agg's rmse: 1.38497 + 0.0202823 best n_estimators: 43 best cv score: 1.3838664241 ``` 所以我們得到了結果,在學習率0.1的時候,有43個估計器的時候效果最好。所以現在我們已經調整好了一個引數了:```n_estimators/num_iterations/num_round/num_boost_round=43```。 【在硬體裝置允許的條件下,學習率還是越小越好】 # 2. 提高擬合程度 **這是提高精確度的最重要的引數。** - ```max_depth```:設定樹深度,深度越大可能過擬合 - ```num_leaves```:因為 LightGBM 使用的是 leaf-wise 的演算法,因此在調節樹的複雜程度時,使用的是 num_leaves 而不是 max_depth。大致換算關係:num_leaves = 2^(max_depth),但是它的值的設定應該小於 2^(max_depth),否則可能會導致過擬合。 【這裡雖然說了num_leaves與max_depth之間的關係,但是並不是嚴格的,大概在這個左右就好了。】 接下來同時對這兩個引數調優,引入```sklearn```中的```GridSearchCV()```函式進行網格搜尋,當然也可以使用貝葉斯搜尋,貝葉斯這個之前在個人部落格講過,之後我有空了再搬運到公眾號好了。 不過這個搜尋過程,非常耗時間,非常消耗精力。對於大資料集的話,建議貝葉斯,或者就簡單調整下就行了。一般這種引數優化的空間非常有限。 ```python from sklearn.model_selection import GridSearchCV ### 我們可以建立lgb的sklearn模型,使用上面選擇的(學習率,評估器數目) model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=50, learning_rate=0.1, n_estimators=43, max_depth=6, metric='rmse', bagging_fraction = 0.8,feature_fraction = 0.8) params_test1={ 'max_depth': range(3,8,2), 'num_leaves':range(50, 170, 30) } gsearch1 = GridSearchCV(estimator=model_lgb, param_grid=params_test1, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4) ``` ```python gsearch1.fit(df_train, y_train) gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_ ``` 來看下執行的結果: ```python Fitting 5 folds for each of 12 candidates, totalling 60 fits [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.0min [Parallel(n_jobs=4)]: Done 60 out of 60 | elapsed: 3.1min finished ([mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 50}, mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 80}, mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 110}, mean: -1.88629, std: 0.13750, params: {'max_depth': 3, 'num_leaves': 140}, mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 50}, mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 80}, mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 110}, mean: -1.86917, std: 0.12590, params: {'max_depth': 5, 'num_leaves': 140}, mean: -1.89254, std: 0.10904, params: {'max_depth': 7, 'num_leaves': 50}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 80}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 110}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 140}], {'max_depth': 7, 'num_leaves': 80}, -1.8602436718814157) ``` 這裡運行了12個引數組合,得到的最優解是在```max_depth```為7,```num_leaves```為80的情況下,分數為-1.860。 這裡必須說一下,sklearn模型評估裡的scoring引數都是採用的**higher return values are better than lower return values**(較高的返回值優於較低的返回值)。 但是,我採用的metric策略採用的是均方誤差(rmse),越低越好,所以sklearn就提供了```neg_mean_squared_error```引數,也就是返回metric的負數,所以就均方差來說,也就變成負數越大越好了。 所以,可以看到,最優解的分數為-1.860,轉化為均方差為np.sqrt(-(-1.860)) = 1.3639,明顯比step1的分數要好很多。(之前用的是rmse均方根誤差,要開方) 至此,我們將我們這步得到的最優解代入第三步。其實,我這裡只進行了粗調,如果要得到更好的效果,可以將max_depth在7附近多取幾個值,num_leaves在80附近多取幾個值。千萬不要怕麻煩,雖然這確實很麻煩。 ```python params_test2={ 'max_depth': [6,7,8], 'num_leaves':[68,74,80,86,92] } gsearch2 = GridSearchCV(estimator=model_lgb, param_grid=params_test2, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4) gsearch2.fit(df_train, y_train) gsearch2.grid_scores_, gsearch2.best_params_, gsearch2.best_score_ ``` ```python Fitting 5 folds for each of 15 candidates, totalling 75 fits [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.8min [Parallel(n_jobs=4)]: Done 75 out of 75 | elapsed: 5.1min finished ([mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 68}, mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 74}, mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 80}, mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 86}, mean: -1.87506, std: 0.11369, params: {'max_depth': 6, 'num_leaves': 92}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 68}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 74}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 80}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 86}, mean: -1.86024, std: 0.11364, params: {'max_depth': 7, 'num_leaves': 92}, mean: -1.88197, std: 0.11295, params: {'max_depth': 8, 'num_leaves': 68}, mean: -1.89117, std: 0.12686, params: {'max_depth': 8, 'num_leaves': 74}, mean: -1.86390, std: 0.12259, params: {'max_depth': 8, 'num_leaves': 80}, mean: -1.86733, std: 0.12159, params: {'max_depth': 8, 'num_leaves': 86}, mean: -1.86665, std: 0.12174, params: {'max_depth': 8, 'num_leaves': 92}], {'max_depth': 7, 'num_leaves': 68}, -1.8602436718814157) ``` 可見最大深度7是沒問題的,但是看細節的話,發現在最大深度為7的情況下,葉結點的數量對分數並沒有影響。 # 3. 降低過擬合 說到這裡,就該降低過擬合了。 - ```min_data_in_leaf```是一個很重要的引數, 也叫min_child_samples,它的值取決於訓練資料的樣本個樹和num_leaves. 將其設定的較大可以避免生成一個過深的樹, 但有可能導致欠擬合。 - ```min_sum_hessian_in_leaf```:也叫min_child_weight,使一個結點分裂的最小海森值之和,真拗口(Minimum sum of hessians in one leaf to allow a split. Higher values potentially decrease overfitting) 關於第二個引數,其實我不是非常的明白,因為不太瞭解hessian值和hessian矩陣?之後有空抽個時間好好學習一下,我學習的過程就是這樣查漏補缺2333。**請大家關注公眾號,這樣不會錯過每一個乾貨** 我們採用跟上面相同的方法進行: ```python params_test3={ 'min_child_samples': [18, 19, 20, 21, 22], 'min_child_weight':[0.001, 0.002] } model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80, learning_rate=0.1, n_estimators=43, max_depth=7, metric='rmse', bagging_fraction = 0.8, feature_fraction = 0.8) gsearch3 = GridSearchCV(estimator=model_lgb, param_grid=params_test3, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4) gsearch3.fit(df_train, y_train) gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_ ``` 結果是: ```python Fitting 5 folds for each of 10 candidates, totalling 50 fits [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.9min [Parallel(n_jobs=4)]: Done 50 out of 50 | elapsed: 3.3min finished ([mean: -1.88057, std: 0.13948, params: {'min_child_samples': 18, 'min_child_weight': 0.001}, mean: -1.88057, std: 0.13948, params: {'min_child_samples': 18, 'min_child_weight': 0.002}, mean: -1.88365, std: 0.13650, params: {'min_child_samples': 19, 'min_child_weight': 0.001}, mean: -1.88365, std: 0.13650, params: {'min_child_samples': 19, 'min_child_weight': 0.002}, mean: -1.86024, std: 0.11364, params: {'min_child_samples': 20, 'min_child_weight': 0.001}, mean: -1.86024, std: 0.11364, params: {'min_child_samples': 20, 'min_child_weight': 0.002}, mean: -1.86980, std: 0.14251, params: {'min_child_samples': 21, 'min_child_weight': 0.001}, mean: -1.86980, std: 0.14251, params: {'min_child_samples': 21, 'min_child_weight': 0.002}, mean: -1.86750, std: 0.13898, params: {'min_child_samples': 22, 'min_child_weight': 0.001}, mean: -1.86750, std: 0.13898, params: {'min_child_samples': 22, 'min_child_weight': 0.002}], {'min_child_samples': 20, 'min_child_weight': 0.001}, -1.8602436718814157) ``` 這是我經過粗調後細調的結果,可以看到,```min_data_in_leaf```的最優值為20,而```min_sum_hessian_in_leaf```對最後的值幾乎沒有影響。且這裡調參之後,最優的結果還是-1.86024,沒有提升。 # 4. 取樣降低過擬合 **這兩個引數都是為了降低過擬合的。** ```feature_fraction```引數來進行特徵的子抽樣。這個引數可以用來防止過擬合及提高訓練速度。 ```bagging_fraction+bagging_freq```引數必須同時設定,bagging_fraction相當於subsample樣本取樣,可以使bagging更快的執行,同時也可以降擬合。bagging_freq預設0,表示bagging的頻率,0意味著沒有使用bagging,k意味著每k輪迭代進行一次bagging。 不同的引數,同樣的方法: ```python params_test4={ 'feature_fraction': [0.5, 0.6, 0.7, 0.8, 0.9], 'bagging_fraction': [0.6, 0.7, 0.8, 0.9, 1.0] } model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80, learning_rate=0.1, n_estimators=43, max_depth=7, metric='rmse', bagging_freq = 5, min_child_samples=20) gsearch4 = GridSearchCV(estimator=model_lgb, param_grid=params_test4, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4) gsearch4.fit(df_train, y_train) gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_ ``` ```python Fitting 5 folds for each of 25 candidates, totalling 125 fits [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.6min [Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed: 7.1min finished ([mean: -1.90447, std: 0.15841, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.5}, mean: -1.90846, std: 0.13925, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.6}, mean: -1.91695, std: 0.14121, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.7}, mean: -1.90115, std: 0.12625, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.8}, mean: -1.92586, std: 0.15220, params: {'bagging_fraction': 0.6, 'feature_fraction': 0.9}, mean: -1.88031, std: 0.17157, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.5}, mean: -1.89513, std: 0.13718, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.6}, mean: -1.88845, std: 0.13864, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.7}, mean: -1.89297, std: 0.12374, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.8}, mean: -1.89432, std: 0.14353, params: {'bagging_fraction': 0.7, 'feature_fraction': 0.9}, mean: -1.88088, std: 0.14247, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.5}, mean: -1.90080, std: 0.13174, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.6}, mean: -1.88364, std: 0.14732, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.7}, mean: -1.88987, std: 0.13344, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.8}, mean: -1.87752, std: 0.14802, params: {'bagging_fraction': 0.8, 'feature_fraction': 0.9}, mean: -1.88348, std: 0.13925, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.5}, mean: -1.87472, std: 0.13301, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.6}, mean: -1.88656, std: 0.12241, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.7}, mean: -1.89029, std: 0.10776, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.8}, mean: -1.88719, std: 0.11915, params: {'bagging_fraction': 0.9, 'feature_fraction': 0.9}, mean: -1.86170, std: 0.12544, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.5}, mean: -1.87334, std: 0.13099, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.6}, mean: -1.85412, std: 0.12698, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.7}, mean: -1.86024, std: 0.11364, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.8}, mean: -1.87266, std: 0.12271, params: {'bagging_fraction': 1.0, 'feature_fraction': 0.9}], {'bagging_fraction': 1.0, 'feature_fraction': 0.7}, -1.8541224387666373) ``` 從這裡可以看出來,```bagging_feaction```和```feature_fraction```的理想值分別是1.0和0.7,一個很重要原因就是,我的樣本數量比較小(4000+),但是特徵數量很多(1000+)。所以,這裡我們取更小的步長,對feature_fraction進行更細緻的取值。 下面微調一下: ```python params_test5={ 'feature_fraction': [0.62, 0.65, 0.68, 0.7, 0.72, 0.75, 0.78 ] } model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80, learning_rate=0.1, n_estimators=43, max_depth=7, metric='rmse', min_child_samples=20) gsearch5 = GridSearchCV(estimator=model_lgb, param_grid=params_test5, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4) gsearch5.fit(df_train, y_train) gsearch5.grid_scores_, gsearch5.best_params_, gsearch5.best_score_ ``` ```python Fitting 5 folds for each of 7 candidates, totalling 35 fits [Parallel(n_jobs=4)]: Done 35 out of 35 | elapsed: 2.3min finished ([mean: -1.86696, std: 0.12658, params: {'feature_fraction': 0.62}, mean: -1.88337, std: 0.13215, params: {'feature_fraction': 0.65}, mean: -1.87282, std: 0.13193, params: {'feature_fraction': 0.68}, mean: -1.85412, std: 0.12698, params: {'feature_fraction': 0.7}, mean: -1.88235, std: 0.12682, params: {'feature_fraction': 0.72}, mean: -1.86329, std: 0.12757, params: {'feature_fraction': 0.75}, mean: -1.87943, std: 0.12107, params: {'feature_fraction': 0.78}], {'feature_fraction': 0.7}, -1.8541224387666373) ``` 好吧,feature_fraction就是0.7了 # 5. L1與L2 正則化引數lambda_l1(reg_alpha), lambda_l2(reg_lambda),毫無疑問,是降低過擬合的,兩者分別對應l1正則化和l2正則化。我們也來嘗試一下使用這兩個引數。 ```python params_test6={ 'reg_alpha': [0, 0.001, 0.01, 0.03, 0.08, 0.3, 0.5], 'reg_lambda': [0, 0.001, 0.01, 0.03, 0.08, 0.3, 0.5] } model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80, learning_rate=0.b1, n_estimators=43, max_depth=7, metric='rmse', min_child_samples=20, feature_fraction=0.7) gsearch6 = GridSearchCV(estimator=model_lgb, param_grid=params_test6, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4) gsearch6.fit(df_train, y_train) gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_ ``` ```python Fitting 5 folds for each of 49 candidates, totalling 245 fits [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.8min [Parallel(n_jobs=4)]: Done 192 tasks | elapsed: 10.6min [Parallel(n_jobs=4)]: Done 245 out of 245 | elapsed: 13.3min finished ([mean: -1.85412, std: 0.12698, params: {'reg_alpha': 0, 'reg_lambda': 0}, mean: -1.85990, std: 0.13296, params: {'reg_alpha': 0, 'reg_lambda': 0.001}, mean: -1.86367, std: 0.13634, params: {'reg_alpha': 0, 'reg_lambda': 0.01}, mean: -1.86787, std: 0.13881, params: {'reg_alpha': 0, 'reg_lambda': 0.03}, mean: -1.87099, std: 0.12476, params: {'reg_alpha': 0, 'reg_lambda': 0.08}, mean: -1.87670, std: 0.11849, params: {'reg_alpha': 0, 'reg_lambda': 0.3}, mean: -1.88278, std: 0.13064, params: {'reg_alpha': 0, 'reg_lambda': 0.5}, mean: -1.86190, std: 0.13613, params: {'reg_alpha': 0.001, 'reg_lambda': 0}, mean: -1.86190, std: 0.13613, params: {'reg_alpha': 0.001, 'reg_lambda': 0.001}, mean: -1.86515, std: 0.14116, params: {'reg_alpha': 0.001, 'reg_lambda': 0.01}, mean: -1.86908, std: 0.13668, params: {'reg_alpha': 0.001, 'reg_lambda': 0.03}, mean: -1.86852, std: 0.12289, params: {'reg_alpha': 0.001, 'reg_lambda': 0.08}, mean: -1.88076, std: 0.11710, params: {'reg_alpha': 0.001, 'reg_lambda': 0.3}, mean: -1.88278, std: 0.13064, params: {'reg_alpha': 0.001, 'reg_lambda': 0.5}, mean: -1.87480, std: 0.13889, params: {'reg_alpha': 0.01, 'reg_lambda': 0}, mean: -1.87284, std: 0.14138, params: {'reg_alpha': 0.01, 'reg_lambda': 0.001}, mean: -1.86030, std: 0.13332, params: {'reg_alpha': 0.01, 'reg_lambda': 0.01}, mean: -1.86695, std: 0.12587, params: {'reg_alpha': 0.01, 'reg_lambda': 0.03}, mean: -1.87415, std: 0.13100, params: {'reg_alpha': 0.01, 'reg_lambda': 0.08}, mean: -1.88543, std: 0.13195, params: {'reg_alpha': 0.01, 'reg_lambda': 0.3}, mean: -1.88076, std: 0.13502, params: {'reg_alpha': 0.01, 'reg_lambda': 0.5}, mean: -1.87729, std: 0.12533, params: {'reg_alpha': 0.03, 'reg_lambda': 0}, mean: -1.87435, std: 0.12034, params: {'reg_alpha': 0.03, 'reg_lambda': 0.001}, mean: -1.87513, std: 0.12579, params: {'reg_alpha': 0.03, 'reg_lambda': 0.01}, mean: -1.88116, std: 0.12218, params: {'reg_alpha': 0.03, 'reg_lambda': 0.03}, mean: -1.88052, std: 0.13585, params: {'reg_alpha': 0.03, 'reg_lambda': 0.08}, mean: -1.87565, std: 0.12200, params: {'reg_alpha': 0.03, 'reg_lambda': 0.3}, mean: -1.87935, std: 0.13817, params: {'reg_alpha': 0.03, 'reg_lambda': 0.5}, mean: -1.87774, std: 0.12477, params: {'reg_alpha': 0.08, 'reg_lambda': 0}, mean: -1.87774, std: 0.12477, params: {'reg_alpha': 0.08, 'reg_lambda': 0.001}, mean: -1.87911, std: 0.12027, params: {'reg_alpha': 0.08, 'reg_lambda': 0.01}, mean: -1.86978, std: 0.12478, params: {'reg_alpha': 0.08, 'reg_lambda': 0.03}, mean: -1.87217, std: 0.12159, params: {'reg_alpha': 0.08, 'reg_lambda': 0.08}, mean: -1.87573, std: 0.14137, params: {'reg_alpha': 0.08, 'reg_lambda': 0.3}, mean: -1.85969, std: 0.13109, params: {'reg_alpha': 0.08, 'reg_lambda': 0.5}, mean: -1.87632, std: 0.12398, params: {'reg_alpha': 0.3, 'reg_lambda': 0}, mean: -1.86995, std: 0.12651, params: {'reg_alpha': 0.3, 'reg_lambda': 0.001}, mean: -1.86380, std: 0.12793, params: {'reg_alpha': 0.3, 'reg_lambda': 0.01}, mean: -1.87577, std: 0.13002, params: {'reg_alpha': 0.3, 'reg_lambda': 0.03}, mean: -1.87402, std: 0.13496, params: {'reg_alpha': 0.3, 'reg_lambda': 0.08}, mean: -1.87032, std: 0.12504, params: {'reg_alpha': 0.3, 'reg_lambda': 0.3}, mean: -1.88329, std: 0.13237, params: {'reg_alpha': 0.3, 'reg_lambda': 0.5}, mean: -1.87196, std: 0.13099, params: {'reg_alpha': 0.5, 'reg_lambda': 0}, mean: -1.87196, std: 0.13099, params: {'reg_alpha': 0.5, 'reg_lambda': 0.001}, mean: -1.88222, std: 0.14735, params: {'reg_alpha': 0.5, 'reg_lambda': 0.01}, mean: -1.86618, std: 0.14006, params: {'reg_alpha': 0.5, 'reg_lambda': 0.03}, mean: -1.88579, std: 0.12398, params: {'reg_alpha': 0.5, 'reg_lambda': 0.08}, mean: -1.88297, std: 0.12307, params: {'reg_alpha': 0.5, 'reg_lambda': 0.3}, mean: -1.88148, std: 0.12622, params: {'reg_alpha': 0.5, 'reg_lambda': 0.5}], {'reg_alpha': 0, 'reg_lambda': 0}, -1.8541224387666373) ``` 哈哈,看來我多此一舉了。 # 6. 降低learning_rate 回到第一步,不過我們使用的是已經優化好的引數值: ```python params = { 'boosting_type': 'gbdt', 'objective': 'regression', 'learning_rate': 0.005, 'num_leaves': 80, 'max_depth': 7, 'min_data_in_leaf': 20, 'subsample': 1, 'colsample_bytree': 0.7, } data_train = lgb.Dataset(df_train, y_train, silent=True) cv_results = lgb.cv( params, data_train, num_boost_round=10000, nfold=5, stratified=False, shuffle=True, metrics='rmse', early_stopping_rounds=50, verbose_eval=100, show_stdv=True) print('best n_estimators:', len(cv_results['rmse-mean'])) print('best cv score:', cv_results['rmse-mean'][-1]) ``` ```python [100] cv_agg's rmse: 1.52939 + 0.0261756 [200] cv_agg's rmse: 1.43535 + 0.0187243 [300] cv_agg's rmse: 1.39584 + 0.0157521 [400] cv_agg's rmse: 1.37935 + 0.0157429 [500] cv_agg's rmse: 1.37313 + 0.0164503 [600] cv_agg's rmse: 1.37081 + 0.0172752 [700] cv_agg's rmse: 1.36942 + 0.0177888 [800] cv_agg's rmse: 1.36854 + 0.0180575 [900] cv_agg's rmse: 1.36817 + 0.0188776 [1000] cv_agg's rmse: 1.36796 + 0.0190279 [1100] cv_agg's rmse: 1.36783 + 0.0195969 best n_estimators: 1079 best cv score: 1.36772351783 ``` 參考連結: 1. https://www.2cto.com/kf/201607/528771.html 2. https://zhuanlan.zhihu.com/p/30627440 3. https://www.jianshu.com/p/b4ac0596e5ef 4. https://www.cnblogs.com/bjwu/p/9307344.html ![](https://imgconvert.csdnimg.cn/aHR0cDovL2hlbGxvd29ybGQyMDIwLm5ldC93cC1jb250ZW50L3VwbG9hZHMvMjAyMC8wNy8lRTklQkIlOTglRTglQUUlQTQlRTYlQTAlODclRTklQTIlOThfJUU1JThBJUE4JUU2JTgwJTgxJUU1JTg4JTg2JUU1JTg5JUIyJUU3JUJBJUJGXzIwMjAtMDctMjAtMC5naWY) ![](https://imgconvert.csdnimg.cn/aHR0cDovL2hlbGxvd29ybGQyMDIwLm5ldC93cC1jb250ZW50L3VwbG9hZHMvMjAyMC8wNy93cF9lZGl0b3JfbWRfMzEyZGQyZDliYmNmZmNiZDk0Y2YwODlkYTE4YzVjNGEuanBn?x-oss-process=image/format,png) ![](https://imgconvert.csdnimg.cn/aHR0cDovL2hlbGxvd29ybGQyMDIwLm5ldC93cC1jb250ZW50L3VwbG9hZHMvMjAyMC8wNy8lRTklQkIlOTglRTglQUUlQTQlRTYlOTYlODclRTQlQkIlQjYxNTk1MjUxNjIxMTEyLnBuZw?x-oss-process=image/for